Graph is below:
gremlin> a = graph.addVertex("name", "alice")
gremlin> b = graph.addVertex("name", "bobby")
gremlin> c = graph.addVertex("name", "cindy")
gremlin> d = graph.addVertex("name", "david")
gremlin> e = graph.addVertex("name", "eliza")
gremlin> a.addEdge("rates",b,"tag","ruby","value",9)
gremlin> b.addEdge("rates",c,"tag","ruby","value",8)
gremlin> c.addEdge("rates",d,"tag","ruby","value",7)
gremlin> d.addEdge("rates",e,"tag","ruby","value",6)
gremlin> e.addEdge("rates",a,"tag","java","value",10)
I have 3 scripts below:
Script #1
gremlin> g.V().has('name','alice').
repeat(out()).
until(has('name','alice')).
cyclicPath().
path().by('name')`
==>[alice,bobby,cindy,david,eliza,alice]
Script #2
gremlin> g.V().has('name','alice').
repeat(outE().inV()).
until(has('name','alice')).
cyclicPath().
group().
by('name').
by(path().unfold().has('value').values('value').fold()).
next()
==>alice=[9, 8, 7, 6, 10]
Script #3
gremlin> g.V().has('name','alice').
repeat(outE().inV()).
until(has('name','alice')).
cyclicPath().
group().
by('name').
by(path().unfold().has('value').values('value').fold()).
next().collect { k, v ->
k + '=' + v.withIndex().collect { Integer it, Integer idx ->
return it * (1/(idx + 1))
}.inject(0.0) { acc,i -> acc + i }
}
==>alice=18.8333333331
My question is, how can I get the result as below listed? Just combine the 3
alice=[alice,bobby,cindy,david,eliza,alice]=[9, 8, 7, 6, 10]=18.8333333331
It's probably much easier or at least more maintainable to execute 3 queries and then merge the results as suggested by David. However, if you want to do it all in a single query, you can:
g.V().has('name','alice').as('v').
repeat(outE().as('e').inV().as('v')).
until(has('name','alice')).
store('a').
by('name').
store('a').
by(select(all, 'v').unfold().values('name').fold()).
store('a').
by(select(all, 'e').unfold().
store('x').
by(union(values('value'),
select('x').count(local)).fold()).
cap('x').
store('a').
by(unfold().limit(local, 1).fold()).unfold().
sack(assign).
by(constant(1d)).
sack(div).
by(union(constant(1d),
tail(local, 1)).sum()).
sack(mult).
by(limit(local, 1)).
sack().sum()).
cap('a')
Using your sample graph:
gremlin> g.V().has('name','alice').as('v').
......1> repeat(outE().as('e').inV().as('v')).
......2> until(has('name','alice')).
......3> store('a').
......4> by('name').
......5> store('a').
......6> by(select(all, 'v').unfold().values('name').fold()).
......7> store('a').
......8> by(select(all, 'e').unfold().
......9> store('x').
.....10> by(union(values('value'),
.....11> select('x').count(local)).fold()).
.....12> cap('x').
.....13> store('a').
.....14> by(unfold().limit(local, 1).fold()).unfold().
.....15> sack(assign).
.....16> by(constant(1d)).
.....17> sack(div).
.....18> by(union(constant(1d),
.....19> tail(local, 1)).sum()).
.....20> sack(mult).
.....21> by(limit(local, 1)).
.....22> sack().sum()).
.....23> cap('a')
==>[alice,[alice,bobby,cindy,david,eliza,alice],[9,8,7,6,10],18.833333333333332]
It has some benefits to do it all in a single query, especially as you don't have to traverse the same path over and over again, but again, it's hard to maintain such complex queries. It's probably better to just return the full path and then build the expected result on the client side.
Gremlin code is executed in a Groovy executor, so all Groovy operators are valid here. You can add your results to a list and return the list, i.e. def l = []; l << result1; l << result2; l;.
Related
I'm having difficult time figuring out query in gremlin for the following scenario. Here is the the directed graph (may be cyclic).
I want to get top N favorable nodes, starting from node "Jane", where favor is defined as:
favor(Jane->Lisa) = edge(Jane,Lisa) / total weight from outwards edges of Lisa
favor(Jane->Thomas) = favor(Jane->Thomas) + favor(Jane->Lisa) * favor(Lisa->Thomas)
favor(Jane->Jerryd) = favor(Jane->Thomas) * favor(Thomas->Jerryd) + favor(Jane->Lisa) * favor(Lisa->Jerryd)
favor(Jane->Jerryd) = [favor(Jane->Thomas) + favor(Jane->Lisa) * favor(Lisa->Thomas)] * favor(Thomas->Jerryd) + favor(Jane->Lisa) * favor(Lisa->Jerryd)
and so .. on
Here is same graph with hand calculation of what I mean,
This is fairly simple to transferse with programming but I'm not sure, how ecactly to query it with gremlin or even sparql.
Here is the query to create this example graph:
g
.addV('person').as('1').property(single, 'name', 'jane')
.addV('person').as('2').property(single, 'name', 'thomas')
.addV('person').as('3').property(single, 'name', 'lisa')
.addV('person').as('4').property(single, 'name', 'wyd')
.addV('person').as('5').property(single, 'name', 'jerryd')
.addE('favor').from('1').to('2').property('weight', 10)
.addE('favor').from('1').to('3').property('weight', 20)
.addE('favor').from('3').to('2').property('weight', 90)
.addE('favor').from('2').to('4').property('weight', 50)
.addE('favor').from('2').to('5').property('weight', 90)
.addE('favor').from('3').to('5').property('weight', 100)
All I'm looking for is:
[Lisa, computedFavor]
[Thomas, computedFavor]
[Jerryd, computedFavor]
[Wyd, computedFavor]
I'm struggling to incooperate cyclic graph to adjust weight. This is where I've been able to query so far: https://gremlify.com/f2r0zy03oxc/2
g.V().has('name','jane'). // our starting node
repeat(
union(
outE() // get only outwards edges
).
otherV().simplePath()). // produce simple path
emit().
times(10). // max depth of 10
path(). // attain path
by(valueMap())
Addressing Comments from stephen mallette:
favor(Jane->Jerryd) =
favor(Jane->Thomas) * favor(Thomas->Jerryd)
+ favor(Jane->Lisa) * favor(Lisa->Jerryd)
// note we can expand on favor(Jane->Thomas) in above expression
//
// favor(Jane->Thomas) is favor(Jane->Thomas)#directEdge +
// favor(Jane->Lisa) * favor(Lisa->Thomas)
//
Calculation Example
Jane to Lisa => 20/(10+20) => 2/3
Lisa to Jerryd => 100/(100+90) => 10/19
Jane to Lisa to Jerryd => 2/3*(10/19)
Jane to Thomas (directly) => 10/(10+20) => 1/3
Jane to Lisa to Thomas => 2/3 * 90/(100+90) => 2/3 * 9/19
Jane to Thomas => 1/3 + (2/3 * 9/19)
Thomas to Jerryd => 90/(90+50) => 9/14
Jane to Thomas to Jerryd => [1/3 + (2/3 * 9/19)] * (9/14)
Jane to Jerryd:
= Jane to Lisa to Jerryd + Jane to Thomas to Jerryd
= 2/3 * (10/19) + [1/3 + (2/3 * 9/19)] * (9/14)
Here is somewhat of psedocode:
def get_favors(graph, label="jane", starting_favor=1):
start = graph.findNode(label)
queue = [(start, starting_favor)]
favors = {}
seen = set()
while queue:
node, curr_favor = queue.popleft()
# get total weight (out edges) from this node
total_favor = 0
for (edgeW, outNode) in node.out_edges:
total_favor = total_favor + edgeW
for (edgeW, outNode) in node.out_edges:
# if there are no favors for this node
# take current favor and provide proportional favor
if outNode not in favors:
favors[outNode] = curr_favor * (edgeW / total_favor)
# it already has some favor, so we add to it
# we add proportional favor
else:
favors[outNode] += curr_favor * (edgeW / total_favor)
# if we have seen this edge, and node ignore
# otherwise, transverse
if (edgeW, outNode) not in seen:
seen.add((edgeW, outNode))
queue.append((outNode, favors[outNode]))
# sort favor by value and return top X
return favors
Here is a Gremlin query that I believe applies your formula correctly. I'll paste the full final query first then say a few words about the steps involved.
gremlin> g.withSack(1).V().
......1> has('name','jane').
......2> repeat(outE().
......3> sack(mult).
......4> by(project('w','f').
......5> by('weight').
......6> by(outV().outE().values('weight').sum()).
......7> math('w / f')).
......8> inV().
......9> simplePath()).
.....10> until(has('name','jerryd')).
.....11> sack().
.....12> sum()
==>0.768170426065163
The query starts with Jane and keeps traversing until all paths to Jerry D have been inspected. Along the way for each traverser a sack is maintained containing the calculated weight values for each relationship multiplied together. The calculation on line 6 finds all the edge weight values possible coming from the prior vertex and the math step on line 7 is used to divide the weight on the current edge by that sum. At the very end each of the computed results is added together on line 12. If you remove the final sum step you can see the intermediate results.
gremlin> g.withSack(1).V().
......1> has('name','jane').
......2> repeat(outE().
......3> sack(mult).
......4> by(project('w','f').
......5> by('weight').
......6> by(outV().outE().values('weight').sum()).
......7> math('w / f')).
......8> inV().
......9> simplePath()).
.....10> until(has('name','jerryd')).
.....11> sack()
==>0.2142857142857143
==>0.3508771929824561
==>0.2030075187969925
To see the routes taken a path step can be added to the query.
gremlin> g.withSack(1).V().
......1> has('name','jane').
......2> repeat(outE().
......3> sack(mult).
......4> by(project('w','f').
......5> by('weight').
......6> by(outV().outE().values('weight').sum()).
......7> math('w / f')).
......8> inV().
......9> simplePath()).
.....10> until(has('name','jerryd')).
.....11> local(
.....12> union(
.....13> path().
.....14> by('name').
.....15> by('weight'),
.....16> sack()).fold())
==>[[jane,10,thomas,90,jerryd],0.2142857142857143]
==>[[jane,20,lisa,100,jerryd],0.3508771929824561]
==>[[jane,20,lisa,90,thomas,90,jerryd],0.2030075187969925]
This approach also takes account of adding in any direct connections, per your formula as we can see if we use Thomas as the target.
gremlin> g.withSack(1).V().
......1> has('name','jane').
......2> repeat(outE().
......3> sack(mult).
......4> by(project('w','f').
......5> by('weight').
......6> by(outV().outE().values('weight').sum()).
......7> math('w / f')).
......8> inV().
......9> simplePath()).
.....10> until(has('name','thomas')).
.....11> local(
.....12> union(
.....13> path().
.....14> by('name').
.....15> by('weight'),
.....16> sack()).fold())
==>[[jane,10,thomas],0.3333333333333333]
==>[[jane,20,lisa,90,thomas],0.3157894736842105]
These extra steps are not needed but having the path included is useful when debugging queries like this. Also, and this is not necessary but perhaps just for general interest, I will add that you can also get to the final answer from here but the very first query I included is all you really need.
g.withSack(1).V().
has('name','jane').
repeat(outE().
sack(mult).
by(project('w','f').
by('weight').
by(outV().outE().values('weight').sum()).
math('w / f')).
inV().
simplePath()).
until(has('name','thomas')).
local(
union(
path().
by('name').
by('weight'),
sack()).fold().tail(local)).
sum()
==>0.6491228070175439
If any of this is unclear or I have mis-understood the formula, please let me know.
EDITED to add
To find the results for all people reachable from Jane I had to modify the query a little bit. The unfold at the end is just to make the results easier to read.
gremlin> g.withSack(1).V().
......1> has('name','jane').
......2> repeat(outE().
......3> sack(mult).
......4> by(project('w','f').
......5> by('weight').
......6> by(outV().outE().values('weight').sum()).
......7> math('w / f')).
......8> inV().
......9> simplePath()).
.....10> emit().
.....11> local(
.....12> union(
.....13> path().
.....14> by('name').
.....15> by('weight').unfold(),
.....16> sack()).fold()).
.....17> group().
.....18> by(tail(local,2).limit(local,1)).
.....19> by(tail(local).sum()).
.....20> unfold()
==>jerryd=0.768170426065163
==>wyd=0.23182957393483708
==>lisa=0.6666666666666666
==>thomas=0.6491228070175439
The final group step on line 17 uses the path results to calculate the total favor for each unique name found. To see the paths you can run the query with the group step removed.
gremlin> g.withSack(1).V().
......1> has('name','jane').
......2> repeat(outE().
......3> sack(mult).
......4> by(project('w','f').
......5> by('weight').
......6> by(outV().outE().values('weight').sum()).
......7> math('w / f')).
......8> inV().
......9> simplePath()).
.....10> emit().
.....11> local(
.....12> union(
.....13> path().
.....14> by('name').
.....15> by('weight').unfold(),
.....16> sack()).fold())
==>[jane,10,thomas,0.3333333333333333]
==>[jane,20,lisa,0.6666666666666666]
==>[jane,10,thomas,50,wyd,0.11904761904761904]
==>[jane,10,thomas,90,jerryd,0.2142857142857143]
==>[jane,20,lisa,90,thomas,0.3157894736842105]
==>[jane,20,lisa,100,jerryd,0.3508771929824561]
==>[jane,20,lisa,90,thomas,50,wyd,0.11278195488721804]
==>[jane,20,lisa,90,thomas,90,jerryd,0.2030075187969925]
This answer is quite elegant and best for the environment involved with Neptune and Python. I offer a second for reference, in case others come across this question. From the moment I saw this question I could only ever picture it as being solved with a VertexProgram in OLAP fashion with a GraphComputer. As a result, I had a hard time thinking of it any other way. Of course, use of a VertexProgram requires a JVM language like Java and will not work directly with Neptune. I suppose my closest workaround would have been to use Java, grab a subgraph() from Neptune and then run the custom VertexProgram in TinkerGraph locally which would be quite speedy to do.
More generally, without the Python/Neptune requirements, converting an algorithm to a VertexProgram is not a bad approach depending on the nature of the graph and the amount of data that needs to be traversed. As there isn't a lot of content out there on this topic I thought I'd offer the core of the code for it here. This is the guts of it:
#Override
public void execute(final Vertex vertex, final Messenger<Double> messenger, final Memory memory) {
// on the first pass calculate the "total favor" for all vertices
// and pass the calculated current favor forward along incident edges
// only for the "start vertex"
if (memory.isInitialIteration()) {
copyHaltedTraversersFromMemory(vertex);
final boolean startVertex = vertex.value("name").equals(nameOfStartVertrex);
final double initialFavor = startVertex ? 1d : 0d;
vertex.property(VertexProperty.Cardinality.single, FAVOR, initialFavor);
vertex.property(VertexProperty.Cardinality.single, TOTAL_FAVOR,
IteratorUtils.stream(vertex.edges(Direction.OUT)).mapToDouble(e -> e.value("weight")).sum());
if (startVertex) {
final Iterator<Edge> incidents = vertex.edges(Direction.OUT);
memory.add(VOTE_TO_HALT, !incidents.hasNext());
while (incidents.hasNext()) {
final Edge incident = incidents.next();
messenger.sendMessage(MessageScope.Global.of(incident.inVertex()),
(double) incident.value("weight") / (double) vertex.value(TOTAL_FAVOR));
}
}
} else {
// on future passes, sum all the incoming "favor" and add it to
// the "favor" property of each vertex. then once again pass the
// current favor to incident edges. this will keep happening
// until the message passing stops.
final Iterator<Double> messages = messenger.receiveMessages();
final boolean hasMessages = messages.hasNext();
if (hasMessages) {
double adjacentFavor = IteratorUtils.reduce(messages, 0.0d, Double::sum);
vertex.property(VertexProperty.Cardinality.single, FAVOR, (double) vertex.value(FAVOR) + adjacentFavor);
final Iterator<Edge> incidents = vertex.edges(Direction.OUT);
memory.add(VOTE_TO_HALT, !incidents.hasNext());
while (incidents.hasNext()) {
final Edge incident = incidents.next();
messenger.sendMessage(MessageScope.Global.of(incident.inVertex()),
adjacentFavor * ((double) incident.value("weight") / (double) vertex.value(TOTAL_FAVOR)));
}
}
}
}
The above is then executed as:
ComputerResult result = graph.compute().program(FavorVertexProgram.build().name("jane").create()).submit().get();
GraphTraversalSource rg = result.graph().traversal();
Traversal elements = rg.V().elementMap();
and that "elements" traversal yields:
{id=0, label=person, ^favor=1.0, name=jane, ^totalFavor=30.0}
{id=2, label=person, ^favor=0.6491228070175439, name=thomas, ^totalFavor=140.0}
{id=4, label=person, ^favor=0.6666666666666666, name=lisa, ^totalFavor=190.0}
{id=6, label=person, ^favor=0.23182957393483708, name=wyd, ^totalFavor=0.0}
{id=8, label=person, ^favor=0.768170426065163, name=jerryd, ^totalFavor=0.0}
I have a graph that looks something like the following:
pathway -> pathway_component -> gene -> organism
You can make an example graph like so:
m1 = g.addV('pathway').property('pathway_name', 'M00002').next()
m2 = g.addV('pathway').property('pathway_name', 'M00527').next()
c1 = g.addV('pathway_component').property('name', 'K00001').next()
c2 = g.addV('pathway_component').property('name', 'K00002').next()
c3 = g.addV('pathway_component').property('name', 'K00003').next()
g.addE('partof').from(c1).to(m1).iterate()
g.addE('partof').from(c2).to(m1).iterate()
g.addE('partof').from(c3).to(m2).iterate()
g1 = g.addV('gene').property('name', 'G00001').next()
g2 = g.addV('gene').property('name', 'G00002').next()
g3 = g.addV('gene').property('name', 'G00003').next()
g4 = g.addV('gene').property('name', 'G00004').next()
g5 = g.addV('gene').property('name', 'G00005').next()
g6 = g.addV('gene').property('name', 'G00006').next()
g7 = g.addV('gene').property('name', 'G00007').next()
g8 = g.addV('gene').property('name', 'G00008').next()
g.addE('isa').from(g1).to(c1).iterate()
g.addE('isa').from(g2).to(c3).iterate()
g.addE('isa').from(g3).to(c1).iterate()
g.addE('isa').from(g4).to(c2).iterate()
g.addE('isa').from(g5).to(c3).iterate()
g.addE('isa').from(g6).to(c1).iterate()
g.addE('isa').from(g7).to(c1).iterate()
g.addE('isa').from(g8).to(c2).iterate()
o1 = g.addV('organism').property('name', 'O000001').next()
o2 = g.addV('organism').property('name', 'O000002').next()
o3 = g.addV('organism').property('name', 'O000003').next()
o4 = g.addV('organism').property('name', 'O000004').next()
g.addE('partof').from(g1).to(o1).iterate()
g.addE('partof').from(g2).to(o1).iterate()
g.addE('partof').from(g3).to(o2).iterate()
g.addE('partof').from(g4).to(o2).iterate()
g.addE('partof').from(g5).to(o3).iterate()
g.addE('partof').from(g6).to(o3).iterate()
g.addE('partof').from(g7).to(o4).iterate()
g.addE('partof').from(g8).to(o4).iterate()
I'd like to count the genes per pathway per organism, so that the results look something like:
organism_1 pathway_1 gene_count
organism_1 pathway_2 gene_count
organism_2 pathway_1 gene_count
organism_2 pathway_2 gene_count
But so far I haven't figured it out. I tried the following:
g.V().has('pathway', 'pathway_name', within('M00002', 'M00527')).project('organism', 'pathway', 'count').
by(__.in().hasLabel('pathway_component').
in().hasLabel('gene').
out().hasLabel('organism').
values('name')).
by('pathway_name').
by(__.in().hasLabel('pathway_component').
in().hasLabel('gene').
count())
But it looks like the grouping is wrong:
==>[organism:O000001,pathway:M00002,count:6]
==>[organism:O000001,pathway:M00527,count:2]
In this case it seems like all of the organisms and their counts are being grouped together (there are four organisms) for the two pathways listed. I'd expect to see something like:
O000001 M00002 1
O000001 M00527 1
O000002 M00002 2
O000002 M00527 0
O000003 M00002 1
O000003 M00527 1
O000004 M00002 2
O000004 M00527 0
How can I split out the results by both different organisms and different pathways?
Hopefully the final query below helps. I showed the steps I used to get there, part of which was making sure I understood the structure of your data.
First I wanted to see the shape of the graph.
gremlin> g.V().hasLabel('pathway').
......1> in().hasLabel('pathway_component').
......2> in().hasLabel('gene').
......3> out().hasLabel('organism').
......4> path().
......5> by('pathway_name').
......6> by('name').
......7> by('name').
......8> by('name')
==>[M00002,K00001,G00006,O000003]
==>[M00002,K00001,G00007,O000004]
==>[M00002,K00001,G00001,O000001]
==>[M00002,K00001,G00003,O000002]
==>[M00002,K00002,G00004,O000002]
==>[M00002,K00002,G00008,O000004]
==>[M00527,K00003,G00005,O000003]
==>[M00527,K00003,G00002,O000001]
Then I used path and group to learn a bit more about these relationship groupings.
gremlin> g.V().hasLabel('pathway').
......1> in().hasLabel('pathway_component').
......2> in().hasLabel('gene').
......3> out().hasLabel('organism').as('org').
......4> group().
......5> by(select('org').by('name')).
......6> by(
......7> path().
......8> by('pathway_name').
......9> by('name').
.....10> by('name').
.....11> by('name').fold()).
.....12> unfold()
==>O000004=[path[M00002, K00001, G00007, O000004], path[M00002, K00002, G00008, O000004]]
==>O000003=[path[M00002, K00001, G00006, O000003], path[M00527, K00003, G00005, O000003]]
==>O000002=[path[M00002, K00001, G00003, O000002], path[M00002, K00002, G00004, O000002]]
==>O000001=[path[M00002, K00001, G00001, O000001], path[M00527, K00003, G00002, O000001]]
Finally I changed the above query to nest two groups
gremlin> g.V().hasLabel('pathway').as('pathway').
......1> in().hasLabel('pathway_component').
......2> in().hasLabel('gene').as('gene').
......3> out().hasLabel('organism').as('org').
......4> group().
......5> by(select('org').by('name')).
......6> by(
......7> group().
......8> by(select('pathway').by('pathway_name')).
......9> by(select('gene').by('name').fold())).
.....10> unfold()
==>O000004={M00002=[G00007, G00008]}
==>O000003={M00002=[G00006], M00527=[G00005]}
==>O000002={M00002=[G00003, G00004]}
==>O000001={M00002=[G00001], M00527=[G00002]}
This yields the organism, the pathway name and the genes.
Building on that I changed the query again to generate the counts. I hope this is close to what you needed.
gremlin> g.V().hasLabel('pathway').as('pathway').
......1> in().hasLabel('pathway_component').
......2> in().hasLabel('gene').as('gene').
......3> out().hasLabel('organism').as('org').
......4> group().
......5> by(select('org').by('name')).
......6> by(
......7> group().
......8> by(select('pathway').by('pathway_name')).
......9> by(select('gene').by('name').fold().count(local))).
.....10> unfold()
==>O000004={M00002=2}
==>O000003={M00002=1, M00527=1}
==>O000002={M00002=2}
==>O000001={M00002=1, M00527=1}
Sample data: TinkerPop Modern Graph
Conditions:
Is vadas connected to lop within 2 hops
Is vadas connected to peter within 3 hops
Is vadas connected to does-not-exists in 1 hops (a search that wont give any results)
Dummy searches with expected results
Conditions 1 AND 2
=> [vadas-marko-lop, vadas-marko-lop-peter]
Conditions 1 OR 3
=> [vadas-marko-lop]
What I was able to get
Conditions 1 AND 2
gremlin> g.V().has("person", "name", "vadas").as("from")
.select("from").as("to1").repeat(both().as("to1")).times(2).emit().has("software", "name", "lop")
.select("from").as("to2").repeat(both().as("to2")).times(3).emit().has("person", "name", "peter")
.project("a", "b")
.by(select(all, "to1").unfold().values("name").fold())
.by(select(all, "to2").unfold().values("name").fold())
==>[a:[vadas,marko,lop],b:[vadas,marko,lop,peter]]
Conditions 1 OR 2
gremlin> g.V().has("person", "name", "vadas").as("nodes")
.union(repeat(both().as("nodes")).times(2).emit().has("software", "name", "lop"),
out().has("x", "y", "does-not-exist").as("nodes"))
.project("a")
.by(select(all, "nodes").unfold().values("name").fold())
==>[a:[vadas,marko,lop]]
So how to achieve this I have two different query formats, is there a way to writer a query format that can do both?
And this did not work, anything wrong here? Does not return the nodes that have been traversed
g.V().has("person", "name", "vadas").as("nodes")
.or(
repeat(both().as("nodes")).times(2).emit().has("software", "name", "lop"),
repeat(both().as("nodes")).times(3).emit().has("person", "name", "peter")
)
.project("a").by(select(all, "nodes").unfold().values("name").fold())
==>[a:[vadas]]
// Expect paths to be printed here vadas..lop, vadas...peter
I don't know if I understand what you're after, but if you just need something like a query template, then maybe this will help:
gremlin> conditions = [
......1> [filter: {has("software", "name", "lop")}, distance: 2],
......2> [filter: {has("person", "name", "peter")}, distance: 3],
......3> [filter: {has("x", "y", "does-not-exist")}, distance: 1]]
==>[filter:groovysh_evaluate$_run_closure1#378bd86d,distance:2]
==>[filter:groovysh_evaluate$_run_closure2#2189e7a7,distance:3]
==>[filter:groovysh_evaluate$_run_closure3#69b2f8e5,distance:1]
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().has("person", "name", "vadas").
......1> union(repeat(both().simplePath()).
......2> times(conditions[0].distance).
......3> emit().
......4> filter(conditions[0].filter()).store("x"),
......5> repeat(both().simplePath()).
......6> times(conditions[1].distance).
......7> emit().
......8> filter(conditions[1].filter()).store("x")).
......9> barrier().
.....10> filter(select("x").
.....11> and(unfold().filter(conditions[0].filter()),
.....12> unfold().filter(conditions[1].filter()))).
.....13> path().
.....14> by("name")
==>[vadas,marko,lop]
==>[vadas,marko,lop,peter]
gremlin> g.V().has("person", "name", "vadas").
......1> union(repeat(both().simplePath()).
......2> times(conditions[0].distance).
......3> emit().
......4> filter(conditions[0].filter()).store("x"),
......5> repeat(both().simplePath()).
......6> times(conditions[2].distance).
......7> emit().
......8> filter(conditions[2].filter()).store("x")).
......9> barrier().
.....10> filter(select("x").
.....11> or(unfold().filter(conditions[0].filter()),
.....12> unfold().filter(conditions[2].filter()))).
.....13> path().
.....14> by("name")
==>[vadas,marko,lop]
And a little more abstraction should make it clearer that the two queries only differ in 1 step (and vs or):
apply = { condition ->
repeat(both().simplePath()).
times(condition.distance).
emit().
filter(condition.filter()).store("x")
}
verify = { condition ->
unfold().filter(condition.filter())
}
// condition 1 AND 2
g.V().has("person", "name", "vadas").
union(apply(conditions[0]),
apply(conditions[1])).
barrier().
filter(select("x").
and(verify(conditions[0]),
verify(conditions[1]))).
path().
by("name")
// condition 1 OR 3
g.V().has("person", "name", "vadas").
union(apply(conditions[0]),
apply(conditions[2])).
barrier().
filter(select("x").
or(verify(conditions[0]),
verify(conditions[2]))).
path().
by("name")
gremlin> a = graph.addVertex("name", "alice")
gremlin> b = graph.addVertex("name", "bobby")
gremlin> c = graph.addVertex("name", "cindy")
gremlin> d = graph.addVertex("name", "david")
gremlin> e = graph.addVertex("name", "eliza")
gremlin> a.addEdge("rates",b,"tag","ruby","value",9)
gremlin> b.addEdge("rates",c,"tag","ruby","value",8)
gremlin> c.addEdge("rates",d,"tag","ruby","value",7)
gremlin> d.addEdge("rates",e,"tag","ruby","value",6)
gremlin> e.addEdge("rates",a,"tag","java","value",9)
g.V().has('name', 'alice').repeat(out()).times(6).cyclicPath().path().by('name')
I want to end with alice node. and I want to repeat all the step not want to specify time as 6. The requirement is I want to get all the loop from alice or get all the loops from the graph.
You can refer to the Cycle Detection section in the TinkerPop Recipes - it adapts fairly easily to your sample graph:
gremlin> g.V().has('name', 'alice').as('a').
......1> repeat(out().simplePath()).
......2> emit(loops().is(gt(1))).
......3> both().where(eq('a')).
......4> path().
......5> by('name').
......6> dedup().
......7> by(unfold().order().dedup().fold())
==>[alice,bobby,cindy,david,eliza,alice]
My code should read the 4 columns, split them into vertices for the first 2 columns, and edge properties for the last two columns.The CSV file has 33 unique vertices in 37 lines of data. What I don't understand is why I get 74 vertices instead and 37 edges. Interestingly enough, if I omit the addE statment I just get 37 vertices.
Obviously the property portion hasn't been included as I've been trying to resolve my current issues.
1\t2\tstep\tcmp
2\t3\tconductor\tna
3\t4\tswitch\tZ300
\t for tab
etc.
My code is:
graph = TinkerGraph.open()
graph.createIndex('myId', Vertex.class)
g = graph.traversal()
getOrCreate = { myid ->
p = g.V('myId', myid)
if (!p.hasNext())
{g.addV('connector').property('myId',myid) }
else
{p.next()}
}
new File('Continuity.txt').eachLine {
if (!it.startsWith("#")){
def row = it .split('\t')
def fromVertex = getOrCreate(row[0])
def toVertex = getOrCreate(row[1])
g.addE("connection").from(fromVertex).to(toVertex).iterate()
}
}
There's at least on problem in the code that I see. In this line:
p = g.V('myId', myid)
you are telling gremlin to find vertices with ids "myId" and whatever the value of the variable myid is. You instead want:
p = g.V().has('myId', myid)
The syntax you were using is from TinkerPop 2.x. I tested your code this way with some other changes and it seems to work properly now:
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> graph.createIndex('myId', Vertex.class)
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> getOrCreate = { myid ->
......1> if (!g.V().has('myId', myid).hasNext())
......2> g.addV('connector').property('myId',myid)
......3> else
......4> g.V().has('myId', myid)
......5> }
==>groovysh_evaluate$_run_closure1#29d37757
gremlin> g.addE('connection').from(getOrCreate(1)).to(getOrCreate(2)).iterate()
gremlin> g.addE('connection').from(getOrCreate(1)).to(getOrCreate(2)).iterate()
gremlin> g.V()
==>v[0]
==>v[2]
gremlin> g.E()
==>e[4][2-connection->0]
==>e[5][2-connection->0]