Gremlin What is the difference between transform and sideEffect - gremlin

total n00b here when it comes to Gremlin, but would someone mind telling me the difference between a "transform" and "sideEffect"? I read the Gremlin Docs and both seem to "at times" take inputs, massage the data, and produce outputs.
I thought I was on to something when it looked like "transforms" were mostly "getters", but then why would it be called "transform"? And then I saw that they put functions under transform that did much more than just get data.
Any help is appreciated, thanks in advance!

A transform behaves as a map function, where the value in the Gremlin Pipeline is literally tranformed to a different value:
gremlin> g.V.transform{it.name}
==>marko
==>vadas
==>lop
==>josh
==>ripple
==>peter
Note that in the example above, we've gone from a Vertex before the transform step to the string value of the name property on that Vertex. Now watch what happens with sideEffect:
gremlin> g.V.sideEffect{it.name}
==>v[1]
==>v[2]
==>v[3]
==>v[4]
==>v[5]
==>v[6]
The Vertex passes through sideEffect unchanged (i.e. it exits the sideEffect as a Vertex, just as it went in). The point of a sideEffect is that you can do something at that step of the pipeline that does "something" unrelated to the processing of the pipeline itself. In other words, the return value of the sideEffect closure is basically ignored.

Related

Use property of one part of graph traverse as filter for other

I want next:
Traverse part of graph
Take property from first traverse
Put it into other traversal as filter
Get filtered value
When I run next in Gremlin console:
g = TinkerGraph.open().traversal()
g.addV('a').property(id, 1).property('b',2)
g.addV('a').property(id, 2).property('b',2).property('c',3)
g.V(2).properties().key().limit(1).as('q').select('q')
g.V(2).properties().key().limit(1).as('q').V(1).properties().key()
g.V(2).properties().key().limit(1).as('q').V(1).properties().key().select('q')
g.V(2).properties().key().limit(1).as('q').V(1).properties().key().where(__.is('b'))
g.V(2).properties().key().limit(1).as('q').V(1).properties().key().where(__.is(select('q')))
I get:
gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV('a').property(id, 1).property('b',2)
==>v[1]
gremlin> g.addV('a').property(id, 2).property('b',2).property('c',3)
==>v[2]
gremlin> g.V(2).properties().key().limit(1).as('q').select('q')
==>b
gremlin> g.V(2).properties().key().limit(1).as('q').V(1).properties().key()
==>b
gremlin> g.V(2).properties().key().limit(1).as('q').V(1).properties().key().select('q')
==>b
gremlin> g.V(2).properties().key().limit(1).as('q').V(1).properties().key().where(__.is('b'))
==>b
gremlin> g.V(2).properties().key().limit(1).as('q').V(1).properties().key().where(__.is(select('q')))
gremlin>
So I can see that:
My first traverse path gets property of 'b'
Selecting by direct usage of literal 'b' works
Using projection to filter by 'b' does not works.
So question is - how to use value from one part of traversal as filter of other traversal in case described above?
My use case is that I have prototype vertex. I want to grapb all its properties(and may be values), and find all vertices which are similar to that prototype.
Other alternative is to store query inside property of prototype, read it and evaluate it to get vertices which are filtered by it.
I know that I can do application side join of strings, but I want to stay only in code less part of Gremlin to have proper provider portability.
UPDATE:
Example from official documentation:
gremlin> firstYear = g.V().hasLabel('person').
local(properties('location').values('startTime').min()).
max().next()
==>2004
gremlin> l = g.V().hasLabel('person').as('person').
properties('location').or(has('endTime',gt(firstYear)),hasNot('endTime')).as('location').
valueMap().as('times').
select('person','location','times').by('name').by(value).by().toList()
How can I use firstYear without having variables in console, but to reference that from query?
I see your question was answered on the Gremlin Users list. [1] Copying the answer here for others that may search for the same question.
What you're looking for is:
g.V(2).properties().key().limit(1).as('q').V(1).properties().key().where(eq('q'))
See documentation for the Where Step to learn about the different usage patterns of where.
[1] https://groups.google.com/forum/#!topic/gremlin-users/f1NfwUw9ZVI

TinkerPop gremlin count vertices only in a path()

When I make a query of a path e.g.:
g.V(1).inE().outV().inE().outV().inE().outV().path()
There are both vertices and edges in the path(), is there any way to count the number of vertices in the path only and ignore edges?
Gremlin is missing something important to make this really easy to do - it doesn't discern types very well for purposes of filtering, thus TINKERPOP-2234. I've altered your example a bit so that we could have something a little trickier to work with:
gremlin> g.V(1).repeat(outE().inV()).emit().path()
==>[v[1],e[9][1-created->3],v[3]]
==>[v[1],e[7][1-knows->2],v[2]]
==>[v[1],e[8][1-knows->4],v[4]]
==>[v[1],e[8][1-knows->4],v[4],e[10][4-created->5],v[5]]
==>[v[1],e[8][1-knows->4],v[4],e[11][4-created->3],v[3]]
With repeat() we get variable length Path instances so dynamic counting of the vertices is a bit trickier than the fixed example you have in your question where the pattern of the path is known and a count is easy to discern just from the Gremlin itself. So, with a dynamic number of vertices and without TINKERPOP-2234 you have to get creative. A typical strategy is to just filter away the edges by way of some label or property value that is unique to vertices:
gremlin> g.V(1).repeat(outE().inV()).emit().path().map(unfold().hasLabel('person','software').fold())
==>[v[1],v[3]]
==>[v[1],v[2]]
==>[v[1],v[4]]
==>[v[1],v[4],v[5]]
==>[v[1],v[4],v[3]]
gremlin> g.V(1).repeat(outE().inV()).emit().path().map(unfold().hasLabel('person','software').fold()).count(local)
==>2
==>2
==>2
==>3
==>3
Or perhaps use an property unique to all edges:
gremlin> g.V(1).repeat(outE().inV()).emit().path().map(unfold().not(has('weight')).fold())
==>[v[1],v[3]]
==>[v[1],v[2]]
==>[v[1],v[4]]
==>[v[1],v[4],v[5]]
==>[v[1],v[4],v[3]]
gremlin> g.V(1).repeat(outE().inV()).emit().path().map(unfold().not(has('weight')).fold()).count(local)
==>2
==>2
==>2
==>3
==>3
If you don't have these properties or labels in your schema that allows for this you could probably use your traversal pattern to come up with some math to figure it out. In my case, i know that my Path will always be (pathLength + 1) / 2 so:
gremlin> g.V(1).repeat(outE().inV()).emit().path().as('p').math('(p + 1) / 2').by(count(local))
==>2.0
==>2.0
==>2.0
==>3.0
==>3.0
Hopefully, one of those ways will inspire you to a solution.
+1 for typeOf predicate support in Gremlin (TINKERPOP-2234).
In addition to #stephan's answer, you can also mark and select only vertices:
g.V().repeat(outE().inV().as('v')).times(3).select(all,'v')
Also, if the graph provider support it, you can also use {it.class}:
g.V().repeat(outE().inV().as('v')).times(3).path()
.map(unfold().groupCount().by({it.class}))

Using repeat() and times() to create multiple edges at once

How do I use the times() Step on my repeat(..) to create multiple, identical edges at once?
g.V().has('Label1', 'id', '1234').repeat(addE('HAS').from(g.V().has('Label2', 'id', '5678'))).times(5)
I would think that it adds my edge 5 times to this vertex, in fact it returns nothing when times() is great than 1. Why is that and how would I use repeat() correctly?
I'm not sure what graph database you are using, but I'm somewhat surprised you don't get an error with that bit of Gremlin and that error should yield a hint as to what is wrong.
gremlin> g.V().has('person','name','marko').repeat(addE('knows').from(V().has('person','name','stephen'))).times(5)
org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerEdge cannot be cast to org.apache.tinkerpop.gremlin.structure.Vertex
Type ':help' or ':h' for help.
Display stack trace? [yN]
The repeat() step is not meant to simply execute the same child traversal with the same input for each iteration. It is meant to execute the same child traversal with the output of the previous iteration as the new input. That means on the first iteration we initialize that child traversal of:
addE('knows').from(V().has('person','name','stephen'))
with the "marko" vertex, but the output of that traversal is an Edge (because the output of addE() is an Edge). On the second iteration that edge becomes the input to addE() and therefore....error....as you can't call addE() on an edge.
If you want to use repeat() for this type of flow control you can though, but you need to arrange the child traversal so that the input is that same initial vertex on each iteration:
gremlin> g.addV('person').property('name','marko').addV('person').property('name','stephen').iterate()
gremlin> g.V().has('person','name','marko').as('m').
......1> V().has('person','name','stephen').as('s').
......2> repeat(select('m').addE('knows').to('s')).
......3> times(3).iterate()
gremlin> g.E()
==>e[4][0-knows->2]
==>e[5][0-knows->2]
==>e[6][0-knows->2]

Gremlin: Between function

Sample data: Modern graph
I am trying to do starts with search,
This returns marko
gremlin> g.V().has("name", between("m", "mz")).values("name")
==>marko
This returns none
gremlin> g.V().has("name", between("m", "ma")).values("name")
gremlin>
So is that I have to always use "z"? or what is the logic of using "z" in the between function. I don't find any documentation for the same.
I was silently corrected by the Gremlin Guru on that answer I'd given (it's updated now), but the second argument to between is exclusive so if you want all "name" that start with "m" then you need the second argument to be "n".

How to limit the number of times a branch is traversed

Starting with the toy graph I can find which vertexes are creators by looking for edges that have 'created' out edges:
gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
graph.traversal().V().as('a').out('created').select('a').values('name')
==>marko
==>josh
==>josh
==>peter
I can filter out the duplicates with the dedup step...
gremlin> graph.traversal().V().as('a').out('created').select('a').dedup().values('name')
==>marko
==>josh
==>peter
...but this only alters the output, not the path followed by the Gremlin. If creators can be supernodes I'd like to tell the query to output 'a' once it finds its first 'created' edge and to then stop traversing the out step for the current 'a' and proceed to the next 'a'. Can this be done?
This syntax has the desired output. Do they behave like I intend?
graph.traversal().V().where(out('created').count().is(gt(0))).values('name')
graph.traversal().V().where(out('created').limit(1).count().is(gt(0))).values('name')
Is there a better recipe?
EDIT: I just found an example in the where doc (example 2) that shows the presence of a link being evaluated as truth (may not be wording this correctly):
graph.traversal().V().where(out('created')).values('name')
There's a warning about the star-graph problem, which I think doesn't apply here because, and I'm guessing, there is only one where step that tests a branch?
Your last example is the way to go.
g.V().where(out('created')).values('name')
Strategies will optimize that for you and turn it into:
g.V().where(outE('created')).values('name')
Also, .where(outE('created')) will not iterate through all the out-edges, it's just like a .hasNext(), hence no supernode problem.

Resources