Grouping on properties from different verticies - gremlin

I have a graph that looks something like the following:
pathway -> pathway_component -> gene -> organism
You can make an example graph like so:
m1 = g.addV('pathway').property('pathway_name', 'M00002').next()
m2 = g.addV('pathway').property('pathway_name', 'M00527').next()
c1 = g.addV('pathway_component').property('name', 'K00001').next()
c2 = g.addV('pathway_component').property('name', 'K00002').next()
c3 = g.addV('pathway_component').property('name', 'K00003').next()
g.addE('partof').from(c1).to(m1).iterate()
g.addE('partof').from(c2).to(m1).iterate()
g.addE('partof').from(c3).to(m2).iterate()
g1 = g.addV('gene').property('name', 'G00001').next()
g2 = g.addV('gene').property('name', 'G00002').next()
g3 = g.addV('gene').property('name', 'G00003').next()
g4 = g.addV('gene').property('name', 'G00004').next()
g5 = g.addV('gene').property('name', 'G00005').next()
g6 = g.addV('gene').property('name', 'G00006').next()
g7 = g.addV('gene').property('name', 'G00007').next()
g8 = g.addV('gene').property('name', 'G00008').next()
g.addE('isa').from(g1).to(c1).iterate()
g.addE('isa').from(g2).to(c3).iterate()
g.addE('isa').from(g3).to(c1).iterate()
g.addE('isa').from(g4).to(c2).iterate()
g.addE('isa').from(g5).to(c3).iterate()
g.addE('isa').from(g6).to(c1).iterate()
g.addE('isa').from(g7).to(c1).iterate()
g.addE('isa').from(g8).to(c2).iterate()
o1 = g.addV('organism').property('name', 'O000001').next()
o2 = g.addV('organism').property('name', 'O000002').next()
o3 = g.addV('organism').property('name', 'O000003').next()
o4 = g.addV('organism').property('name', 'O000004').next()
g.addE('partof').from(g1).to(o1).iterate()
g.addE('partof').from(g2).to(o1).iterate()
g.addE('partof').from(g3).to(o2).iterate()
g.addE('partof').from(g4).to(o2).iterate()
g.addE('partof').from(g5).to(o3).iterate()
g.addE('partof').from(g6).to(o3).iterate()
g.addE('partof').from(g7).to(o4).iterate()
g.addE('partof').from(g8).to(o4).iterate()
I'd like to count the genes per pathway per organism, so that the results look something like:
organism_1 pathway_1 gene_count
organism_1 pathway_2 gene_count
organism_2 pathway_1 gene_count
organism_2 pathway_2 gene_count
But so far I haven't figured it out. I tried the following:
g.V().has('pathway', 'pathway_name', within('M00002', 'M00527')).project('organism', 'pathway', 'count').
by(__.in().hasLabel('pathway_component').
in().hasLabel('gene').
out().hasLabel('organism').
values('name')).
by('pathway_name').
by(__.in().hasLabel('pathway_component').
in().hasLabel('gene').
count())
But it looks like the grouping is wrong:
==>[organism:O000001,pathway:M00002,count:6]
==>[organism:O000001,pathway:M00527,count:2]
In this case it seems like all of the organisms and their counts are being grouped together (there are four organisms) for the two pathways listed. I'd expect to see something like:
O000001 M00002 1
O000001 M00527 1
O000002 M00002 2
O000002 M00527 0
O000003 M00002 1
O000003 M00527 1
O000004 M00002 2
O000004 M00527 0
How can I split out the results by both different organisms and different pathways?

Hopefully the final query below helps. I showed the steps I used to get there, part of which was making sure I understood the structure of your data.
First I wanted to see the shape of the graph.
gremlin> g.V().hasLabel('pathway').
......1> in().hasLabel('pathway_component').
......2> in().hasLabel('gene').
......3> out().hasLabel('organism').
......4> path().
......5> by('pathway_name').
......6> by('name').
......7> by('name').
......8> by('name')
==>[M00002,K00001,G00006,O000003]
==>[M00002,K00001,G00007,O000004]
==>[M00002,K00001,G00001,O000001]
==>[M00002,K00001,G00003,O000002]
==>[M00002,K00002,G00004,O000002]
==>[M00002,K00002,G00008,O000004]
==>[M00527,K00003,G00005,O000003]
==>[M00527,K00003,G00002,O000001]
Then I used path and group to learn a bit more about these relationship groupings.
gremlin> g.V().hasLabel('pathway').
......1> in().hasLabel('pathway_component').
......2> in().hasLabel('gene').
......3> out().hasLabel('organism').as('org').
......4> group().
......5> by(select('org').by('name')).
......6> by(
......7> path().
......8> by('pathway_name').
......9> by('name').
.....10> by('name').
.....11> by('name').fold()).
.....12> unfold()
==>O000004=[path[M00002, K00001, G00007, O000004], path[M00002, K00002, G00008, O000004]]
==>O000003=[path[M00002, K00001, G00006, O000003], path[M00527, K00003, G00005, O000003]]
==>O000002=[path[M00002, K00001, G00003, O000002], path[M00002, K00002, G00004, O000002]]
==>O000001=[path[M00002, K00001, G00001, O000001], path[M00527, K00003, G00002, O000001]]
Finally I changed the above query to nest two groups
gremlin> g.V().hasLabel('pathway').as('pathway').
......1> in().hasLabel('pathway_component').
......2> in().hasLabel('gene').as('gene').
......3> out().hasLabel('organism').as('org').
......4> group().
......5> by(select('org').by('name')).
......6> by(
......7> group().
......8> by(select('pathway').by('pathway_name')).
......9> by(select('gene').by('name').fold())).
.....10> unfold()
==>O000004={M00002=[G00007, G00008]}
==>O000003={M00002=[G00006], M00527=[G00005]}
==>O000002={M00002=[G00003, G00004]}
==>O000001={M00002=[G00001], M00527=[G00002]}
This yields the organism, the pathway name and the genes.
Building on that I changed the query again to generate the counts. I hope this is close to what you needed.
gremlin> g.V().hasLabel('pathway').as('pathway').
......1> in().hasLabel('pathway_component').
......2> in().hasLabel('gene').as('gene').
......3> out().hasLabel('organism').as('org').
......4> group().
......5> by(select('org').by('name')).
......6> by(
......7> group().
......8> by(select('pathway').by('pathway_name')).
......9> by(select('gene').by('name').fold().count(local))).
.....10> unfold()
==>O000004={M00002=2}
==>O000003={M00002=1, M00527=1}
==>O000002={M00002=2}
==>O000001={M00002=1, M00527=1}

Related

Neptune - How to get distance to all nodes with proportional weights gremlin

I'm having difficult time figuring out query in gremlin for the following scenario. Here is the the directed graph (may be cyclic).
I want to get top N favorable nodes, starting from node "Jane", where favor is defined as:
favor(Jane->Lisa) = edge(Jane,Lisa) / total weight from outwards edges of Lisa
favor(Jane->Thomas) = favor(Jane->Thomas) + favor(Jane->Lisa) * favor(Lisa->Thomas)
favor(Jane->Jerryd) = favor(Jane->Thomas) * favor(Thomas->Jerryd) + favor(Jane->Lisa) * favor(Lisa->Jerryd)
favor(Jane->Jerryd) = [favor(Jane->Thomas) + favor(Jane->Lisa) * favor(Lisa->Thomas)] * favor(Thomas->Jerryd) + favor(Jane->Lisa) * favor(Lisa->Jerryd)
and so .. on
Here is same graph with hand calculation of what I mean,
This is fairly simple to transferse with programming but I'm not sure, how ecactly to query it with gremlin or even sparql.
Here is the query to create this example graph:
g
.addV('person').as('1').property(single, 'name', 'jane')
.addV('person').as('2').property(single, 'name', 'thomas')
.addV('person').as('3').property(single, 'name', 'lisa')
.addV('person').as('4').property(single, 'name', 'wyd')
.addV('person').as('5').property(single, 'name', 'jerryd')
.addE('favor').from('1').to('2').property('weight', 10)
.addE('favor').from('1').to('3').property('weight', 20)
.addE('favor').from('3').to('2').property('weight', 90)
.addE('favor').from('2').to('4').property('weight', 50)
.addE('favor').from('2').to('5').property('weight', 90)
.addE('favor').from('3').to('5').property('weight', 100)
All I'm looking for is:
[Lisa, computedFavor]
[Thomas, computedFavor]
[Jerryd, computedFavor]
[Wyd, computedFavor]
I'm struggling to incooperate cyclic graph to adjust weight. This is where I've been able to query so far: https://gremlify.com/f2r0zy03oxc/2
g.V().has('name','jane'). // our starting node
repeat(
union(
outE() // get only outwards edges
).
otherV().simplePath()). // produce simple path
emit().
times(10). // max depth of 10
path(). // attain path
by(valueMap())
Addressing Comments from stephen mallette:
favor(Jane->Jerryd) =
favor(Jane->Thomas) * favor(Thomas->Jerryd)
+ favor(Jane->Lisa) * favor(Lisa->Jerryd)
// note we can expand on favor(Jane->Thomas) in above expression
//
// favor(Jane->Thomas) is favor(Jane->Thomas)#directEdge +
// favor(Jane->Lisa) * favor(Lisa->Thomas)
//
Calculation Example
Jane to Lisa => 20/(10+20) => 2/3
Lisa to Jerryd => 100/(100+90) => 10/19
Jane to Lisa to Jerryd => 2/3*(10/19)
Jane to Thomas (directly) => 10/(10+20) => 1/3
Jane to Lisa to Thomas => 2/3 * 90/(100+90) => 2/3 * 9/19
Jane to Thomas => 1/3 + (2/3 * 9/19)
Thomas to Jerryd => 90/(90+50) => 9/14
Jane to Thomas to Jerryd => [1/3 + (2/3 * 9/19)] * (9/14)
Jane to Jerryd:
= Jane to Lisa to Jerryd + Jane to Thomas to Jerryd
= 2/3 * (10/19) + [1/3 + (2/3 * 9/19)] * (9/14)
Here is somewhat of psedocode:
def get_favors(graph, label="jane", starting_favor=1):
start = graph.findNode(label)
queue = [(start, starting_favor)]
favors = {}
seen = set()
while queue:
node, curr_favor = queue.popleft()
# get total weight (out edges) from this node
total_favor = 0
for (edgeW, outNode) in node.out_edges:
total_favor = total_favor + edgeW
for (edgeW, outNode) in node.out_edges:
# if there are no favors for this node
# take current favor and provide proportional favor
if outNode not in favors:
favors[outNode] = curr_favor * (edgeW / total_favor)
# it already has some favor, so we add to it
# we add proportional favor
else:
favors[outNode] += curr_favor * (edgeW / total_favor)
# if we have seen this edge, and node ignore
# otherwise, transverse
if (edgeW, outNode) not in seen:
seen.add((edgeW, outNode))
queue.append((outNode, favors[outNode]))
# sort favor by value and return top X
return favors
Here is a Gremlin query that I believe applies your formula correctly. I'll paste the full final query first then say a few words about the steps involved.
gremlin> g.withSack(1).V().
......1> has('name','jane').
......2> repeat(outE().
......3> sack(mult).
......4> by(project('w','f').
......5> by('weight').
......6> by(outV().outE().values('weight').sum()).
......7> math('w / f')).
......8> inV().
......9> simplePath()).
.....10> until(has('name','jerryd')).
.....11> sack().
.....12> sum()
==>0.768170426065163
The query starts with Jane and keeps traversing until all paths to Jerry D have been inspected. Along the way for each traverser a sack is maintained containing the calculated weight values for each relationship multiplied together. The calculation on line 6 finds all the edge weight values possible coming from the prior vertex and the math step on line 7 is used to divide the weight on the current edge by that sum. At the very end each of the computed results is added together on line 12. If you remove the final sum step you can see the intermediate results.
gremlin> g.withSack(1).V().
......1> has('name','jane').
......2> repeat(outE().
......3> sack(mult).
......4> by(project('w','f').
......5> by('weight').
......6> by(outV().outE().values('weight').sum()).
......7> math('w / f')).
......8> inV().
......9> simplePath()).
.....10> until(has('name','jerryd')).
.....11> sack()
==>0.2142857142857143
==>0.3508771929824561
==>0.2030075187969925
To see the routes taken a path step can be added to the query.
gremlin> g.withSack(1).V().
......1> has('name','jane').
......2> repeat(outE().
......3> sack(mult).
......4> by(project('w','f').
......5> by('weight').
......6> by(outV().outE().values('weight').sum()).
......7> math('w / f')).
......8> inV().
......9> simplePath()).
.....10> until(has('name','jerryd')).
.....11> local(
.....12> union(
.....13> path().
.....14> by('name').
.....15> by('weight'),
.....16> sack()).fold())
==>[[jane,10,thomas,90,jerryd],0.2142857142857143]
==>[[jane,20,lisa,100,jerryd],0.3508771929824561]
==>[[jane,20,lisa,90,thomas,90,jerryd],0.2030075187969925]
This approach also takes account of adding in any direct connections, per your formula as we can see if we use Thomas as the target.
gremlin> g.withSack(1).V().
......1> has('name','jane').
......2> repeat(outE().
......3> sack(mult).
......4> by(project('w','f').
......5> by('weight').
......6> by(outV().outE().values('weight').sum()).
......7> math('w / f')).
......8> inV().
......9> simplePath()).
.....10> until(has('name','thomas')).
.....11> local(
.....12> union(
.....13> path().
.....14> by('name').
.....15> by('weight'),
.....16> sack()).fold())
==>[[jane,10,thomas],0.3333333333333333]
==>[[jane,20,lisa,90,thomas],0.3157894736842105]
These extra steps are not needed but having the path included is useful when debugging queries like this. Also, and this is not necessary but perhaps just for general interest, I will add that you can also get to the final answer from here but the very first query I included is all you really need.
g.withSack(1).V().
has('name','jane').
repeat(outE().
sack(mult).
by(project('w','f').
by('weight').
by(outV().outE().values('weight').sum()).
math('w / f')).
inV().
simplePath()).
until(has('name','thomas')).
local(
union(
path().
by('name').
by('weight'),
sack()).fold().tail(local)).
sum()
==>0.6491228070175439
If any of this is unclear or I have mis-understood the formula, please let me know.
EDITED to add
To find the results for all people reachable from Jane I had to modify the query a little bit. The unfold at the end is just to make the results easier to read.
gremlin> g.withSack(1).V().
......1> has('name','jane').
......2> repeat(outE().
......3> sack(mult).
......4> by(project('w','f').
......5> by('weight').
......6> by(outV().outE().values('weight').sum()).
......7> math('w / f')).
......8> inV().
......9> simplePath()).
.....10> emit().
.....11> local(
.....12> union(
.....13> path().
.....14> by('name').
.....15> by('weight').unfold(),
.....16> sack()).fold()).
.....17> group().
.....18> by(tail(local,2).limit(local,1)).
.....19> by(tail(local).sum()).
.....20> unfold()
==>jerryd=0.768170426065163
==>wyd=0.23182957393483708
==>lisa=0.6666666666666666
==>thomas=0.6491228070175439
The final group step on line 17 uses the path results to calculate the total favor for each unique name found. To see the paths you can run the query with the group step removed.
gremlin> g.withSack(1).V().
......1> has('name','jane').
......2> repeat(outE().
......3> sack(mult).
......4> by(project('w','f').
......5> by('weight').
......6> by(outV().outE().values('weight').sum()).
......7> math('w / f')).
......8> inV().
......9> simplePath()).
.....10> emit().
.....11> local(
.....12> union(
.....13> path().
.....14> by('name').
.....15> by('weight').unfold(),
.....16> sack()).fold())
==>[jane,10,thomas,0.3333333333333333]
==>[jane,20,lisa,0.6666666666666666]
==>[jane,10,thomas,50,wyd,0.11904761904761904]
==>[jane,10,thomas,90,jerryd,0.2142857142857143]
==>[jane,20,lisa,90,thomas,0.3157894736842105]
==>[jane,20,lisa,100,jerryd,0.3508771929824561]
==>[jane,20,lisa,90,thomas,50,wyd,0.11278195488721804]
==>[jane,20,lisa,90,thomas,90,jerryd,0.2030075187969925]
This answer is quite elegant and best for the environment involved with Neptune and Python. I offer a second for reference, in case others come across this question. From the moment I saw this question I could only ever picture it as being solved with a VertexProgram in OLAP fashion with a GraphComputer. As a result, I had a hard time thinking of it any other way. Of course, use of a VertexProgram requires a JVM language like Java and will not work directly with Neptune. I suppose my closest workaround would have been to use Java, grab a subgraph() from Neptune and then run the custom VertexProgram in TinkerGraph locally which would be quite speedy to do.
More generally, without the Python/Neptune requirements, converting an algorithm to a VertexProgram is not a bad approach depending on the nature of the graph and the amount of data that needs to be traversed. As there isn't a lot of content out there on this topic I thought I'd offer the core of the code for it here. This is the guts of it:
#Override
public void execute(final Vertex vertex, final Messenger<Double> messenger, final Memory memory) {
// on the first pass calculate the "total favor" for all vertices
// and pass the calculated current favor forward along incident edges
// only for the "start vertex"
if (memory.isInitialIteration()) {
copyHaltedTraversersFromMemory(vertex);
final boolean startVertex = vertex.value("name").equals(nameOfStartVertrex);
final double initialFavor = startVertex ? 1d : 0d;
vertex.property(VertexProperty.Cardinality.single, FAVOR, initialFavor);
vertex.property(VertexProperty.Cardinality.single, TOTAL_FAVOR,
IteratorUtils.stream(vertex.edges(Direction.OUT)).mapToDouble(e -> e.value("weight")).sum());
if (startVertex) {
final Iterator<Edge> incidents = vertex.edges(Direction.OUT);
memory.add(VOTE_TO_HALT, !incidents.hasNext());
while (incidents.hasNext()) {
final Edge incident = incidents.next();
messenger.sendMessage(MessageScope.Global.of(incident.inVertex()),
(double) incident.value("weight") / (double) vertex.value(TOTAL_FAVOR));
}
}
} else {
// on future passes, sum all the incoming "favor" and add it to
// the "favor" property of each vertex. then once again pass the
// current favor to incident edges. this will keep happening
// until the message passing stops.
final Iterator<Double> messages = messenger.receiveMessages();
final boolean hasMessages = messages.hasNext();
if (hasMessages) {
double adjacentFavor = IteratorUtils.reduce(messages, 0.0d, Double::sum);
vertex.property(VertexProperty.Cardinality.single, FAVOR, (double) vertex.value(FAVOR) + adjacentFavor);
final Iterator<Edge> incidents = vertex.edges(Direction.OUT);
memory.add(VOTE_TO_HALT, !incidents.hasNext());
while (incidents.hasNext()) {
final Edge incident = incidents.next();
messenger.sendMessage(MessageScope.Global.of(incident.inVertex()),
adjacentFavor * ((double) incident.value("weight") / (double) vertex.value(TOTAL_FAVOR)));
}
}
}
}
The above is then executed as:
ComputerResult result = graph.compute().program(FavorVertexProgram.build().name("jane").create()).submit().get();
GraphTraversalSource rg = result.graph().traversal();
Traversal elements = rg.V().elementMap();
and that "elements" traversal yields:
{id=0, label=person, ^favor=1.0, name=jane, ^totalFavor=30.0}
{id=2, label=person, ^favor=0.6491228070175439, name=thomas, ^totalFavor=140.0}
{id=4, label=person, ^favor=0.6666666666666666, name=lisa, ^totalFavor=190.0}
{id=6, label=person, ^favor=0.23182957393483708, name=wyd, ^totalFavor=0.0}
{id=8, label=person, ^favor=0.768170426065163, name=jerryd, ^totalFavor=0.0}

How to access stored variable from repeat step in tinkerpop

I am just Two days in to gremlin. I have set of vertices and colored edges. I want to find path from S2 to D2. If I enter to black vertex through green edge ( G1 -B1) then I have to come out only through green edge (B2-G2) . I should n't come out of red edge.
Below query works, But I can’t hardcode colors (has('color',within("green") in 3rd line).
g.V().hasLabel("S2").repeat(outE("tx").choose(values("type")).
option("multiplex",aggregate(local,"colors").by("color").inV()).
option("demultiplex",has('color',within("green")).inV()).
option(none,__.inV()).
simplePath()).until(hasLabel("D2")).path().by(label())
So I tried below query It doesn’t give any path. If my edge has label “multiplex” then I store the color . If my edge has label “demultiplex” then I read the color from store.
g.V().hasLabel("S2").repeat(outE("tx").choose(values("type")).
option("multiplex",aggregate("colors").by("color").inV()).
option("demultiplex",has("color",within(select("colors").unfold())).inV()).
option(none,__.inV()).
simplePath()).until(hasLabel("D2")).path().by(label())
Below code populates the graph
Vertex s1 = g.addV("S1").next();
Vertex s2 = g.addV("S2").next();
Vertex d1 = g.addV("D1").next();
Vertex d2 = g.addV("D2").next();
Vertex r1 = g.addV("R1").next();
Vertex r2 = g.addV("R2").next();
Vertex r3 = g.addV("R3").next();
Vertex r4 = g.addV("R4").next();
Vertex g1 = g.addV("G1").next();
Vertex g2 = g.addV("G2").next();
Vertex g3 = g.addV("G3").next();
Vertex g4 = g.addV("G4").next();
Vertex b1 = g.addV("B1").next();
Vertex b2 = g.addV("B2").next();
Vertex b3 = g.addV("B3").next();
Vertex b4 = g.addV("B4").next();
g.V(s1).addE("tx").to(r1).property("type","straight").next();
g.V(r1).addE("tx").to(b1).property("color","red").property("type","multiplex").next();
g.V(s2).addE("tx").to(g1).property("type","straight").next();
g.V(g1).addE("tx").to(b1).property("color","green").property("type","multiplex").next();
g.V(b1).addE("tx").to(b2).property("type","straight").next();
g.V(b2).addE("tx").to(r2).property("color","red").property("type","demultiplex").next();
g.V(b2).addE("tx").to(g2).property("color","green").property("type","demultiplex").next();
g.V(r2).addE("tx").to(r3).property("type","straight").next();
g.V(g2).addE("tx").to(g3).property("type","straight").next();
g.V(r3).addE("tx").to(b3).property("color","red").property("type","multiplex").next();
g.V(g3).addE("tx").to(b3).property("color","green").property("type","multiplex").next();
g.V(b3).addE("tx").to(b4).property("type","straight").next();
g.V(b4).addE("tx").to(g4).property("color","green").property("type","demultiplex").next();
g.V(g4).addE("tx").to(d2).property("type","straight").next();
g.V(b4).addE("tx").to(r4).property("color","red").property("type","demultiplex").next();
g.V(r4).addE("tx").to(d1).property("type","straight").next();
You were pretty close. This syntax is always tempting:
has("color",within(select("colors").unfold())
but it doesn't work that way as you've found. That P syntax doesn't take a Traversal that way. You need to use a form of where() when you need to reference a side-effect (i.e. "colors").
gremlin> g.V().hasLabel("S2").
......1> repeat(outE("tx").
......2> choose(values("type")).
......3> option("multiplex",aggregate(local,"colors").by("color").inV()).
......4> option("demultiplex", filter(values('color').as('c').
......5> where('c',eq('colors')).
......6> by().
......7> by(unfold().tail())).inV()).
......8> option(none,__.inV()).
......9> simplePath()).
.....10> until(hasLabel("D2")).
.....11> path().by(label)
==>[S2,tx,G1,tx,B1,tx,B2,tx,G2,tx,G3,tx,B3,tx,B4,tx,G4,tx,D2]

TinkerPop: Generic Query to combine and filter multiple traversals

Sample data: TinkerPop Modern Graph
Conditions:
Is vadas connected to lop within 2 hops
Is vadas connected to peter within 3 hops
Is vadas connected to does-not-exists in 1 hops (a search that wont give any results)
Dummy searches with expected results
Conditions 1 AND 2
=> [vadas-marko-lop, vadas-marko-lop-peter]
Conditions 1 OR 3
=> [vadas-marko-lop]
What I was able to get
Conditions 1 AND 2
gremlin> g.V().has("person", "name", "vadas").as("from")
.select("from").as("to1").repeat(both().as("to1")).times(2).emit().has("software", "name", "lop")
.select("from").as("to2").repeat(both().as("to2")).times(3).emit().has("person", "name", "peter")
.project("a", "b")
.by(select(all, "to1").unfold().values("name").fold())
.by(select(all, "to2").unfold().values("name").fold())
==>[a:[vadas,marko,lop],b:[vadas,marko,lop,peter]]
Conditions 1 OR 2
gremlin> g.V().has("person", "name", "vadas").as("nodes")
.union(repeat(both().as("nodes")).times(2).emit().has("software", "name", "lop"),
out().has("x", "y", "does-not-exist").as("nodes"))
.project("a")
.by(select(all, "nodes").unfold().values("name").fold())
==>[a:[vadas,marko,lop]]
So how to achieve this I have two different query formats, is there a way to writer a query format that can do both?
And this did not work, anything wrong here? Does not return the nodes that have been traversed
g.V().has("person", "name", "vadas").as("nodes")
.or(
repeat(both().as("nodes")).times(2).emit().has("software", "name", "lop"),
repeat(both().as("nodes")).times(3).emit().has("person", "name", "peter")
)
.project("a").by(select(all, "nodes").unfold().values("name").fold())
==>[a:[vadas]]
// Expect paths to be printed here vadas..lop, vadas...peter
I don't know if I understand what you're after, but if you just need something like a query template, then maybe this will help:
gremlin> conditions = [
......1> [filter: {has("software", "name", "lop")}, distance: 2],
......2> [filter: {has("person", "name", "peter")}, distance: 3],
......3> [filter: {has("x", "y", "does-not-exist")}, distance: 1]]
==>[filter:groovysh_evaluate$_run_closure1#378bd86d,distance:2]
==>[filter:groovysh_evaluate$_run_closure2#2189e7a7,distance:3]
==>[filter:groovysh_evaluate$_run_closure3#69b2f8e5,distance:1]
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().has("person", "name", "vadas").
......1> union(repeat(both().simplePath()).
......2> times(conditions[0].distance).
......3> emit().
......4> filter(conditions[0].filter()).store("x"),
......5> repeat(both().simplePath()).
......6> times(conditions[1].distance).
......7> emit().
......8> filter(conditions[1].filter()).store("x")).
......9> barrier().
.....10> filter(select("x").
.....11> and(unfold().filter(conditions[0].filter()),
.....12> unfold().filter(conditions[1].filter()))).
.....13> path().
.....14> by("name")
==>[vadas,marko,lop]
==>[vadas,marko,lop,peter]
gremlin> g.V().has("person", "name", "vadas").
......1> union(repeat(both().simplePath()).
......2> times(conditions[0].distance).
......3> emit().
......4> filter(conditions[0].filter()).store("x"),
......5> repeat(both().simplePath()).
......6> times(conditions[2].distance).
......7> emit().
......8> filter(conditions[2].filter()).store("x")).
......9> barrier().
.....10> filter(select("x").
.....11> or(unfold().filter(conditions[0].filter()),
.....12> unfold().filter(conditions[2].filter()))).
.....13> path().
.....14> by("name")
==>[vadas,marko,lop]
And a little more abstraction should make it clearer that the two queries only differ in 1 step (and vs or):
apply = { condition ->
repeat(both().simplePath()).
times(condition.distance).
emit().
filter(condition.filter()).store("x")
}
verify = { condition ->
unfold().filter(condition.filter())
}
// condition 1 AND 2
g.V().has("person", "name", "vadas").
union(apply(conditions[0]),
apply(conditions[1])).
barrier().
filter(select("x").
and(verify(conditions[0]),
verify(conditions[1]))).
path().
by("name")
// condition 1 OR 3
g.V().has("person", "name", "vadas").
union(apply(conditions[0]),
apply(conditions[2])).
barrier().
filter(select("x").
or(verify(conditions[0]),
verify(conditions[2]))).
path().
by("name")

How can I get the combined script result for janusgraph?

Graph is below:
gremlin> a = graph.addVertex("name", "alice")
gremlin> b = graph.addVertex("name", "bobby")
gremlin> c = graph.addVertex("name", "cindy")
gremlin> d = graph.addVertex("name", "david")
gremlin> e = graph.addVertex("name", "eliza")
gremlin> a.addEdge("rates",b,"tag","ruby","value",9)
gremlin> b.addEdge("rates",c,"tag","ruby","value",8)
gremlin> c.addEdge("rates",d,"tag","ruby","value",7)
gremlin> d.addEdge("rates",e,"tag","ruby","value",6)
gremlin> e.addEdge("rates",a,"tag","java","value",10)
I have 3 scripts below:
Script #1
gremlin> g.V().has('name','alice').
repeat(out()).
until(has('name','alice')).
cyclicPath().
path().by('name')`
==>[alice,bobby,cindy,david,eliza,alice]
Script #2
gremlin> g.V().has('name','alice').
repeat(outE().inV()).
until(has('name','alice')).
cyclicPath().
group().
by('name').
by(path().unfold().has('value').values('value').fold()).
next()
==>alice=[9, 8, 7, 6, 10]
Script #3
gremlin> g.V().has('name','alice').
repeat(outE().inV()).
until(has('name','alice')).
cyclicPath().
group().
by('name').
by(path().unfold().has('value').values('value').fold()).
next().collect { k, v ->
k + '=' + v.withIndex().collect { Integer it, Integer idx ->
return it * (1/(idx + 1))
}.inject(0.0) { acc,i -> acc + i }
}
==>alice=18.8333333331
My question is, how can I get the result as below listed? Just combine the 3
alice=[alice,bobby,cindy,david,eliza,alice]=[9, 8, 7, 6, 10]=18.8333333331
It's probably much easier or at least more maintainable to execute 3 queries and then merge the results as suggested by David. However, if you want to do it all in a single query, you can:
g.V().has('name','alice').as('v').
repeat(outE().as('e').inV().as('v')).
until(has('name','alice')).
store('a').
by('name').
store('a').
by(select(all, 'v').unfold().values('name').fold()).
store('a').
by(select(all, 'e').unfold().
store('x').
by(union(values('value'),
select('x').count(local)).fold()).
cap('x').
store('a').
by(unfold().limit(local, 1).fold()).unfold().
sack(assign).
by(constant(1d)).
sack(div).
by(union(constant(1d),
tail(local, 1)).sum()).
sack(mult).
by(limit(local, 1)).
sack().sum()).
cap('a')
Using your sample graph:
gremlin> g.V().has('name','alice').as('v').
......1> repeat(outE().as('e').inV().as('v')).
......2> until(has('name','alice')).
......3> store('a').
......4> by('name').
......5> store('a').
......6> by(select(all, 'v').unfold().values('name').fold()).
......7> store('a').
......8> by(select(all, 'e').unfold().
......9> store('x').
.....10> by(union(values('value'),
.....11> select('x').count(local)).fold()).
.....12> cap('x').
.....13> store('a').
.....14> by(unfold().limit(local, 1).fold()).unfold().
.....15> sack(assign).
.....16> by(constant(1d)).
.....17> sack(div).
.....18> by(union(constant(1d),
.....19> tail(local, 1)).sum()).
.....20> sack(mult).
.....21> by(limit(local, 1)).
.....22> sack().sum()).
.....23> cap('a')
==>[alice,[alice,bobby,cindy,david,eliza,alice],[9,8,7,6,10],18.833333333333332]
It has some benefits to do it all in a single query, especially as you don't have to traverse the same path over and over again, but again, it's hard to maintain such complex queries. It's probably better to just return the full path and then build the expected result on the client side.
Gremlin code is executed in a Groovy executor, so all Groovy operators are valid here. You can add your results to a list and return the list, i.e. def l = []; l << result1; l << result2; l;.

How can I use until in janusgraph?

gremlin> a = graph.addVertex("name", "alice")
gremlin> b = graph.addVertex("name", "bobby")
gremlin> c = graph.addVertex("name", "cindy")
gremlin> d = graph.addVertex("name", "david")
gremlin> e = graph.addVertex("name", "eliza")
gremlin> a.addEdge("rates",b,"tag","ruby","value",9)
gremlin> b.addEdge("rates",c,"tag","ruby","value",8)
gremlin> c.addEdge("rates",d,"tag","ruby","value",7)
gremlin> d.addEdge("rates",e,"tag","ruby","value",6)
gremlin> e.addEdge("rates",a,"tag","java","value",9)
g.V().has('name', 'alice').repeat(out()).times(6).cyclicPath().path().by('name')
I want to end with alice node. and I want to repeat all the step not want to specify time as 6. The requirement is I want to get all the loop from alice or get all the loops from the graph.
You can refer to the Cycle Detection section in the TinkerPop Recipes - it adapts fairly easily to your sample graph:
gremlin> g.V().has('name', 'alice').as('a').
......1> repeat(out().simplePath()).
......2> emit(loops().is(gt(1))).
......3> both().where(eq('a')).
......4> path().
......5> by('name').
......6> dedup().
......7> by(unfold().order().dedup().fold())
==>[alice,bobby,cindy,david,eliza,alice]

Resources