Using repeat() and times() to create multiple edges at once

Using repeat() and times() to create multiple edges at once - gremlin

How do I use the times() Step on my repeat(..) to create multiple, identical edges at once?
g.V().has('Label1', 'id', '1234').repeat(addE('HAS').from(g.V().has('Label2', 'id', '5678'))).times(5)
I would think that it adds my edge 5 times to this vertex, in fact it returns nothing when times() is great than 1. Why is that and how would I use repeat() correctly?

I'm not sure what graph database you are using, but I'm somewhat surprised you don't get an error with that bit of Gremlin and that error should yield a hint as to what is wrong.
gremlin> g.V().has('person','name','marko').repeat(addE('knows').from(V().has('person','name','stephen'))).times(5)
org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerEdge cannot be cast to org.apache.tinkerpop.gremlin.structure.Vertex
Type ':help' or ':h' for help.
Display stack trace? [yN]
The repeat() step is not meant to simply execute the same child traversal with the same input for each iteration. It is meant to execute the same child traversal with the output of the previous iteration as the new input. That means on the first iteration we initialize that child traversal of:
addE('knows').from(V().has('person','name','stephen'))
with the "marko" vertex, but the output of that traversal is an Edge (because the output of addE() is an Edge). On the second iteration that edge becomes the input to addE() and therefore....error....as you can't call addE() on an edge.
If you want to use repeat() for this type of flow control you can though, but you need to arrange the child traversal so that the input is that same initial vertex on each iteration:
gremlin> g.addV('person').property('name','marko').addV('person').property('name','stephen').iterate()
gremlin> g.V().has('person','name','marko').as('m').
......1> V().has('person','name','stephen').as('s').
......2> repeat(select('m').addE('knows').to('s')).
......3> times(3).iterate()
gremlin> g.E()
==>e[4][0-knows->2]
==>e[5][0-knows->2]
==>e[6][0-knows->2]

Related

Gremlin recursive graph traversal with parent and child relationship

I want to traverse a tree and aggregate the parent and its immediate children only. How would I do this using Gremlin and aggregate this into a structure list arrayOf({parent1,child},{child, child1}...}
In this case I want to output [{0,1}, {0,2}, {1,8} {1,6}, {2,7},{2,9}, {8,16},{8,14},{8,15},{7,17}}
The order isnt important. Also, note I want to avoid any circular edges which can exist on the same node only (no circular loop possible from a child vertex to a parent)
Each vertex has a label city and each edge has a label highway
g.V().hasLabel("city").toList().map(x->x.id()+x.edges(Direction.OUT,"highway").collect(Collectors.toList())
My query is timing out and I was wondering if there is a faster way to do this. I have abt 5000 vertices and two vertices are connected with only one edge.

You can get close to what you are looking for using the Gremlin tree step while also avoiding Groovy closures. Assuming the following setup:
gremlin> g = traversal().withGraph(TinkerGraph.open())
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
g.addV('0').as('0').
addV('1').as('1').
addV('2').as('2').
addV('6').as('6').
addV('7').as('7').
addV('8').as('8').
addV('9').as('9').
addV('14').as('14').
addV('15').as('15').
addV('16').as('16').
addV('17').as('17').
addE('route').from('0').to('1').
addE('route').from('0').to('2').
addE('route').from('1').to('6').
addE('route').from('1').to('8').
addE('route').from('2').to('2').
addE('route').from('2').to('9').
addE('route').from('2').to('7').
addE('route').from('7').to('17').
addE('route').from('8').to('14').
addE('route').from('8').to('15').
addE('route').from('8').to('16').iterate()
A query can be written to return the tree (minus cycles) as follows:
gremlin> g.V().hasLabel('0').
......1> repeat(out().simplePath()).
......2> until(__.not(out())).
......3> tree().
......4> by(label)
==>[0:[1:[6:[],8:[14:[],15:[],16:[]]],2:[7:[17:[]],9:[]]]]
An alternative approach, that also avoids using closures:
gremlin> g.V().local(union(label(),out().simplePath().label()).fold())
==>[17]
==>[0,1,2]
==>[1,6,8]
==>[2,9,7]
==>[6]
==>[7,17]
==>[8,14,15,16]
==>[9]
==>[14]
==>[15]
==>[16]
Which can be further refined to avoid leaf only nodes using:
gremlin> g.V().local(union(label(),out().simplePath().label()).fold()).where(count(local).is(gt(1)))
==>[0,1,2]
==>[1,6,8]
==>[2,9,7]
==>[7,17]
==>[8,14,15,16]
In your code you can then create the final pairs or perhaps extend the Gremlin to break up the result even more. Hopefully these approaches will prove more efficient than falling back onto closures (which are not going to be very portable to other TinkerPop implementations that do not support in-line code).

Gremlin query not working - 2 x vertex if doesn't exist and add edge

I'm using gremlin and have the statement below. I want to:
create a vertex if one does not exist
create another vertex if one does not exist
create an edge between the verticies
The edge doesn't get created. I'd really appreciate help with the logic/approach here.
g.V().has('User','refId','435').
fold().coalesce(unfold(),addV(label, 'User').
property('name','John Smith').property('refId','435'))
.as('user').
V().has('JobTitle','name','Advisor').
fold().coalesce(unfold(),addV(label,'JobTitle').
property('name','Advisor'))
.as('jobtitle').
addE('REGISTERED_AS').from('user').to('jobtitle')

Given your most recent comment on Kfir's answer that includes your latest code, I think that there are several problems to correct with your approach. First note that addV() does not take a long list of labels and properties. I'm surprised that didn't generate an error for you. addV() just takes the vertex label as an argument and then you use property() to provide the associated key/value pairs.
g.V().has('User', 'refId', '435').
fold().
coalesce(unfold(),
addV('User').
property('name', 'John Smith').property('refId', '435').property('firstName', 'John').
property('lastName', 'Smith').property('JobTitle', 'Chief Executive Officer')).as('user').
V().has('JobTitle', 'name', 'Advisor').
fold().
coalesce(unfold(), addV(label, 'JobTitle', 'name', 'Advisor')).as('jobtitle').
V().
addE('REGISTERED_AS').
from('user').
to('jobtitle')
There is an extra V() right before addE() which would basically call addE() for every vertex you have in your graph rather than just for the one vertex to which you want to add an edge.
g.V().has('User', 'refId', '435').
fold().
coalesce(unfold(),
addV('User').
property('name', 'John Smith').property('refId', '435').property('firstName', 'John').
property('lastName', 'Smith').property('JobTitle', 'Chief Executive Officer')).as('user').
V().has('JobTitle', 'name', 'Advisor').
fold().
coalesce(unfold(), addV(label, 'JobTitle', 'name', 'Advisor')).as('jobtitle').
addE('REGISTERED_AS').
from('user').
to('jobtitle')
So, now things look syntax correct but there is a problem and it stems from this:
gremlin> g.V(1).as('x').fold().unfold().addE('self').from('x').to('x')
The provided traverser does not map to a value: v[1]->[SelectOneStep(last,x)]
Type ':help' or ':h' for help.
Display stack trace? [yN]
The path information in the traversal is lost after the a reducing step (i.e. fold()) so you can't select back to "x" after that. You need to reform your traversal a bit to not require the fold():
g.V().has('User', 'refId', '435').
fold().
coalesce(unfold(),
addV('User').
property('name', 'John Smith').property('refId', '435').property('firstName', 'John').
property('lastName', 'Smith').property('JobTitle', 'Chief Executive Officer')).as('user').
coalesce(V().has('JobTitle', 'name', 'Advisor'),
addV('JobTitle').property('name', 'Advisor')).as('jobtitle').
addE('REGISTERED_AS').
from('user').
to('jobtitle')
That really just means using coalesce() more directly without the fold() and unfold() pattern. You really only need that pattern at the start of a traversal to ensure that a traverser stays alive in the stream (i.e. if the user doesn't exist fold() produces an empty list which becomes the new traverser and the traversal will continue executing).

The question code is partial. Some alignment would help.
Nevertheless, I think your problem is in the as('jobtitle'), which is inside the coalesce statement. That is, if the vertex exists, we don't get to the second traversal and the as statement is not executed. Same for the as('user').
To solve, just move the as statements outside the coalesce.

Traverse implied edge through property match?

I'm trying to create edges between vertices based on matching the value of a property in each vertex, making what is currently an implied relationship into an explicit relationship. I've been unsuccessful in writing a gremlin traversal that will match up related vertices.
Specifically, given the following graph:
g = TinkerGraph.open().traversal()
g.addV('person').property('name','alice')
g.addV('person').property('name','bob').property('spouse','carol')
g.addV('person').property('name','carol')
g.addV('person').property('name','dave').property('spouse', 'alice')
I was hoping I could create a spouse_of relation using the following
> g.V().has('spouse').as('x')
.V().has('name', select('x').by('spouse'))
.addE('spouse_of').from('x')
but instead of creating one edge from bob to carol and another edge from dave to alice, bob and dave each end up with spouse_of edges to all of the vertices (including themselves):
> g.V().out('spouse_of').path().by('name')
==>[bob,alice]
==>[bob,bob]
==>[bob,carol]
==>[bob,dave]
==>[dave,carol]
==>[dave,dave]
==>[dave,alice]
==>[dave,bob]
It almost seems as if the has filter isn't being applied, or, to use RDBMS terms, as if I'm ending up with an "outer join" instead of the "inner join" I'd intended.
Any suggestions? Am I overlooking something trivial or profound (local vs global scope, perhaps)? Is there any way of accomplishing this in a single traversal query, or do I have to iterate through g.has('spouse') and create edges individually?

You can make this happen in a single traversal, but has() is not meant to work quite that way. The pattern for this is type of traversal is described in the Traversal Induced Values section of the Gremlin Recipes tutorial, but you can see it in action here:
gremlin> g.V().hasLabel('person').has('spouse').as('s').
......1> V().hasLabel('person').as('x').
......2> where('x', eq('s')).
......3> by('name').
......4> by('spouse').
......5> addE('spouse_of').from('s').to('x')
==>e[10][2-spouse_of->5]
==>e[11][7-spouse_of->0]
gremlin> g.E().project('x','y').by(outV().values('name')).by(inV().values('name'))
==>[x:bob,y:carol]
==>[x:dave,y:alice]
While this can be done in a single traversal note that depending on the size of your data this could be an expensive traversal as I'm not sure that either call to V() will be optimized by any graph. While it's neat to use this form, you may find that it's faster to take approaches that ensure that a use of an index is in place which might mean issuing multiple queries to solve the problem.

Why is the depth of leaf node a garbage value instead of being one?

In the recipes section of Tinkerpop/Gremlin the below command is used to compute the depth of a node. This command computes the depth of a node by counting the node itself. However, when we run this command on a leaf node, it returns a garbage value instead of 1. Could someone please clarify regarding this ?
Command:
g.V().has('name','F').repeat(__.in()).emit().path().count(local).max()
If 'F' is a leaf node then it returns incorrect value. I think it should return 1.

Using the same data from the maximum depth recipe, here are some of the results:
gremlin> g.V().has('name', 'F').repeat(__.in()).emit().path().count(local).max()
==>5
gremlin> g.V().has('name', 'C').repeat(__.in()).emit().path().count(local).max()
==>3
gremlin> g.V().has('name', 'A').repeat(__.in()).emit().path().count(local).max()
==>-2147483648
We can learn more about the behavior by removing the last couple steps:
gremlin> g.V().has('name', 'C').repeat(__.in()).emit().path()
==>[v[4],v[6]]
==>[v[4],v[2]]
==>[v[4],v[2],v[0]]
gremlin> g.V().has('name', 'A').repeat(__.in()).emit().path()
gremlin>
You can see that 'C' has 3 paths and 'A' has 0 paths. This is because all of the traversers were killed before anything was emitted. If you move the emit() step before the repeat() step, you will get your desired behavior:
gremlin> g.V().has('name', 'A').emit().repeat(__.in()).path()
==>[v[0]]
gremlin> g.V().has('name', 'A').emit().repeat(__.in()).path().count(local).max()
==>1
You can read a more about the repeat() step and its interactions with the emit() step in the TinkerPop documentation. Specifically, there is a callout box that states:
If emit() is placed after repeat(), it is evaluated on the traversers leaving the repeat-traversal. If emit() is placed before repeat(), it is evaluated on the traversers prior to entering the repeat-traversal.

How to limit the number of times a branch is traversed

Starting with the toy graph I can find which vertexes are creators by looking for edges that have 'created' out edges:
gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
graph.traversal().V().as('a').out('created').select('a').values('name')
==>marko
==>josh
==>josh
==>peter
I can filter out the duplicates with the dedup step...
gremlin> graph.traversal().V().as('a').out('created').select('a').dedup().values('name')
==>marko
==>josh
==>peter
...but this only alters the output, not the path followed by the Gremlin. If creators can be supernodes I'd like to tell the query to output 'a' once it finds its first 'created' edge and to then stop traversing the out step for the current 'a' and proceed to the next 'a'. Can this be done?
This syntax has the desired output. Do they behave like I intend?
graph.traversal().V().where(out('created').count().is(gt(0))).values('name')
graph.traversal().V().where(out('created').limit(1).count().is(gt(0))).values('name')
Is there a better recipe?
EDIT: I just found an example in the where doc (example 2) that shows the presence of a link being evaluated as truth (may not be wording this correctly):
graph.traversal().V().where(out('created')).values('name')
There's a warning about the star-graph problem, which I think doesn't apply here because, and I'm guessing, there is only one where step that tests a branch?

Your last example is the way to go.
g.V().where(out('created')).values('name')
Strategies will optimize that for you and turn it into:
g.V().where(outE('created')).values('name')
Also, .where(outE('created')) will not iterate through all the out-edges, it's just like a .hasNext(), hence no supernode problem.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Using repeat() and times() to create multiple edges at once - gremlin

Related

Gremlin recursive graph traversal with parent and child relationship

Gremlin query not working - 2 x vertex if doesn't exist and add edge

Traverse implied edge through property match?

Why is the depth of leaf node a garbage value instead of being one?

How to limit the number of times a branch is traversed

Categories

Resources