gremlin, what is identity _() pipe - gremlin

I am using java-gremlin, and since the most examples I read about in internet are written in groovy, I supposed that identity pipe has a special meaning in groovy, but i discovered that it exists in java api, so what does it mean?

As described in TinkerPop 2.x, the _() turns an arbitrary object into a pipeline:
gremlin> x = [1,2,3]
==>1
==>2
==>3
gremlin> x._().transform{it+1}
==>2
==>3
==>4
gremlin> x = g.E.has('weight', T.gt, 0.5f).toList()
==>e[10][4-created->5]
==>e[8][1-knows->4]
gremlin> x.inV
==>[StartPipe, InPipe]
==>[StartPipe, InPipe]
gremlin> x._().inV
==>v[5]
==>v[4]
In TinkerPop 3.x, it basically has the same meaning but we tend to refer to it more as the start of an anonymous traversal, one that is not bound to a graph instance. You can read more about it here in a recent post on the Gremlin Users mailing list. Here's how it looks in 3.x:
gremlin> __(1,2,3)
==>1
==>2
==>3
gremlin> __(1,2,3).map{g.V(it.get()).next()}
==>v[1]
==>v[2]
==>v[3]
Examples of it's usage are sprinkled throughout this section:
http://tinkerpop.incubator.apache.org/docs/3.0.0-incubating/#graph-traversal-steps
You actually see it more than you think you might as the documentation does a static import of it so that you don't actually have to use the "__()". For example:
gremlin> g.V().out('knows').where(out('created'))
==>v[4]
is really:
gremlin> g.V().out('knows').where(__().out('created'))
==>v[4]
Finally, note that in TinkerPop 3.x, Groovy is just a "flavor" of Gremlin that introduces a small bit of syntactic sugar. The Gremlin language in 3.x over Java 8 looks mostly identical to the Groovy flavor.
http://tinkerpop.incubator.apache.org/docs/3.0.0-incubating/#_on_gremlin_language_variants

Can't tell you exactly about the value it brings, but it looks like it just maps to itself.
"The identity()-step (map) is an identity function which maps the current object to itself."
http://tinkerpop.apache.org/docs/current/reference/#identity-step

Related

Gremlin Path query with optional empty/termination of path

For a graph like this, the 'Assembly' vertices can optionally be linked to the 'Template' ones, with a 'typeof' edge. If I want to retrieve the graph hierarchy, the current query uses the path. That works fine when there are no template links, but to get the templates, too, I add an extra hop. Is there a way I can get both types of path in a single query. Obviously, adding the extra 'typeof' hop excludes those without a link to a template, which is not what's required.
g.addV('Model').as('1').
addV('Assembly').as('2').
addV('Assembly').as('3').
addV('TemplateAssembly').as('4').
addE('child').from('1').to('2').
addE('child').from('1').to('4').
addE('typeof').from('2').to('4').
addE('child').from('2').to('3')
Query that gets the extra path elements for those that are linked (only returns 1)
g.V('64984')
.repeat(out('child').hasLabel('Assembly').dedup().out('typeof'))
.emit().path()
Query that gets all the assemblies (but is missing the extra path element for those with a typeof link)
g.V('64984')
.repeat(out('child').hasLabel('Assembly').dedup())
.emit().path()
Combining the edge label names as shown below seems to be a reasonable approach unless I am missing a subtlety in the question.
gremlin> g.V().hasLabel('Model').
......1> repeat(out('child','typeof')).
......2> until(__.not(out())).
......3> path().
......4> by(label)
==>[Model,TemplateAssembly]
==>[Model,Assembly,Assembly]
==>[Model,Assembly,TemplateAssembly]
The parallel edges from the Model to Assembly and TemplateAssembly, I suspect are not really wanted as separate paths though, as shown here:
gremlin> g.V().hasLabel('Model').
......1> repeat(out('child','typeof')).
......2> until(__.not(out())).
......3> path().
......4> by(union(id(),label()).fold())
==>[[0,Model],[3,TemplateAssembly]]
==>[[0,Model],[1,Assembly],[2,Assembly]]
==>[[0,Model],[1,Assembly],[3,TemplateAssembly]]
Changing the data model so that a TemplateAssembly connected to the next Assembly would significantly simplify the query I think.
Adding the diagram that helped me reason about the graph.

Reflect on Gremlin traversal type (Edge, Vertex, Property) in CHOOSE step, possible?

I am extending sparql-to-gremlin code to support fully and partially unbound predicate queries that can be used by automated processes to explore the graph structure. The idea being that you could just connect to some graph DB and ask a fully unbound query with some limit and get vertex properties, edge types, edge properties, etc. That can then be explored more.
I can now solve a fully unbound query and can solve one that has the subject bound to a vertex. Now I am trying to put it together into a multi-literal query and finding that the Gremlin MATCH Step would need to reflect on the type of Traversal before it can decide which steps would actually apply. For example if, the Traversal results in a Vertex, asking for out/in edges and properties makes sense; if it’s an Edge though, asking for out/in edges does not make sense and actually results in errors about unexpected type being thrown.
Thus the question, is it possible to write a kind of “switch” statement that would reflect on the type and then only ask for things that makes sense in that context?
Here’s one type of SPARQL query that I am trying to support (based on the Graph of the Gods described here https://old-docs.janusgraph.org/0.1.0/getting-started.html):
https://old-docs.janusgraph.org/0.1.0/images/graph-of-the-gods-2.png
SELECT ?BATTLE ?PRED ?VALUE
WHERE {
vid:6 ep:battled ?BATTLE .
?BATTLE ?PRED ?VALUE .
}
Here we are starting from a vertex with id 6, grabbing the outgoing edge reference with “battled” label, then grabbing all possible properties of the edge along with their values.
Here vertex with id 6 is Hercules, which has 3 outgoing edges with label “battled” going to vertex with id 9 (Nemean), 10 (Hydra) and 11 (Cerberus). I would want to the have ?PRED be bound to v:id (edge id), v:label (edge label), v:time (edge time property value), v:place (edge place property value), eps:battled (an extension to sparql-to-gremlin relating edge to an IN vertex).
I think that I follow your problem and I don't think I have a good answer for you. At the moment, Gremlin isn't terribly good with type detection and the issue remains open on TINKERPOP-2234. The typical workaround for most people when they have a mixed set of elements in a stream is to use a step like coalesce() or choose() to act as a form of switch statement and then figure out some filter than can identify the object type. So here's some mixed results that I've contrived:
gremlin> g.V().union(outE(),__.in())
==>e[9][1-created->3]
==>e[7][1-knows->2]
==>e[8][1-knows->4]
==>v[1]
==>v[1]
==>v[4]
==>v[6]
==>e[10][4-created->5]
==>e[11][4-created->3]
==>v[1]
==>v[4]
==>e[12][6-created->3]
and then I test with hasLabel() for labels I know to belong to vertices only, then everything else must be an edge:
gremlin> g.V().union(outE(),__.in()).choose(hasLabel('person','software'), values('name'), values('weight'))
==>0.4
==>0.5
==>1.0
==>marko
==>marko
==>josh
==>peter
==>1.0
==>0.4
==>marko
==>josh
==>0.2
Not ideal obviously but it typically resolves most people's problems. Hopefully we will see TINKERPOP-2234 solved for 3.5.0.
Another possible workaround is to use a lambda which works well for some use cases though we try to avoid them when possible:
gremlin> g.V().union(outE(),__.in()).choose(filter{it.get() instanceof Vertex}, values('name'), values('weight'))
==>0.4
==>0.5
==>1.0
==>marko
==>marko
==>josh
==>peter
==>1.0
==>0.4
==>marko
==>josh
==>0.2

Some way to count elements in a property in gremlin?

Hi I have a node that represents an email, and one of the properties is the recipients (e.g. { 'john#doe.com', 'max#example.com' }.
Is there a way that I can count how many recipients each email has?
So assuming:
gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV().property('emails','["x#x.com","y#y.com"]')
==>v[0]
I'd start by saying that you should probably parse that JSON to multi-properties if your graph supports it because then you get a more natural approach to dealing with that data. It would be something like:
g.V(0L).values('emails').count()
Gremlin simply doesn't have native methods for parsing JSON so that leaves you with two options I guess:
Use a lambda and a Groovy JsonSlurper
Just return the JSON string and parse it on the client to get your count in your native programming language.
If you were using a lambda it would look like this:
gremlin> json = new groovy.json.JsonSlurper()
==>groovy.json.JsonSlurper#421a4ee1
gremlin> g.V(0L).values('emails').map{json.parseText(it.get())}
==>[x#x.com,y#y.com]
gremlin> g.V(0L).values('emails').map{json.parseText(it.get())}.count(local)
==>2
Note that this assumes your graph supports lambdas and that you can make use of JsonSlurper in that environment. We typically try to get folks to avoid lambdas so your best choice would be to model your data better (i.e. multiproperties or a List) or to process the JSON locally.

Gremlin console addV seem not to add vertices

In Gremlin Console, in order to add a vertex, I do the following:
// One time initialization
graph = TinkerGraph.open()
g = graph.traversal()
// Add the vertex
g.addV('somelabel')
And in the console i get:
==>v[0]
But if I try to traverse the vertex:
g.V(0)
I get nothing in the console, as if the index was wrong.
A proof of that (the fact i get nothing) is:
g.V(0).count()
==>0
If instead i define the id myself:
g.addV('somelabel').property(id, 1)
Everything works fine:
g.V(1)
==>v[1]
But I would not like to define the ids myself...
Where am I doing (or thinking) wrong?
Software version is JanusGraph 0.2.2, Apache TinkerPop 3.2.9
You are not using JanusGraph here but TinkerGraph, an in-memory graph store that is often used for testing or simple examples.
TinkerGraph uses long ids by default which means that it cannot find your vertex when you use an int id. It should work when you use parameter of type long:
gremlin> g.addV('somelabel')
==>v[0]
gremlin> g.V(0)
gremlin> g.V(0L)
==>v[0]
The configuration section for TinkerGraph explains how this configuration can be changed to use different types for its integers.

Is com.tinkerpop.pipes.Pipe implementation safe to cache and re-use later on other graph instances?

I am currently create a Pipe as shown in line 2 below.
Pipe pipe = Gremlin.compile("_().out('knows').name");
After it has been created I am caching it so that it can be re-used with different graphs below
Graph graph = TinkerGraphFactory.createTinkerGraph();
pipe.setStarts(new SingleIterator<Vertex>(graph.getVertex(1)));
for(Object name : pipe)
{
System.out.println((String) name);
}
I am wondering if this is alright? I ask because the javadoc of AbstractPipe says
public void reset()
Description copied from interface: Pipe
A pipe may maintain state. Reset is used to remove state. The general use case for reset() is to reuse a pipe in another computation without having to create a new Pipe object. An implementation of this method should be recursive whereby the starts (if a Pipe) should have this method called on it.
Specified by:
reset in interface Pipe<S,E>
I've never trusted reset despite what the javadocs say on the matter, however this test seems to work:
gremlin> pipe = _().out('knows').name;null
==>null
gremlin> pipe.setStarts(new SingleIterator<Vertex>(g.getVertex(1)));
==>null
gremlin> pipe
==>vadas
==>josh
gremlin> pipe
gremlin> pipe.setStarts(new SingleIterator<Vertex>(g.getVertex(1)));
==>null
gremlin> pipe
==>vadas
==>josh
Calling setStarts seems to properly reset the iterator within the pipe, but reset on its own doesn't seem to have much effect:
gremlin> pipe.setStarts(new SingleIterator<Vertex>(g.getVertex(1)));
==>null
gremlin> pipe
==>vadas
==>josh
gremlin> pipe.reset()
==>null
gremlin> pipe
gremlin>
All that said, I'm not sure caching the Pipeline is saving you all that much. Pipeline creation is pretty cheap and Gremlin.compile() itself caches the script after compilation so future calls to "recreate" that pipeline should be considerably faster than the first call to compile.

Resources