Sample data: Modern graph
I am trying to do starts with search,
This returns marko
gremlin> g.V().has("name", between("m", "mz")).values("name")
==>marko
This returns none
gremlin> g.V().has("name", between("m", "ma")).values("name")
gremlin>
So is that I have to always use "z"? or what is the logic of using "z" in the between function. I don't find any documentation for the same.
I was silently corrected by the Gremlin Guru on that answer I'd given (it's updated now), but the second argument to between is exclusive so if you want all "name" that start with "m" then you need the second argument to be "n".
Related
I have some vertices in my graph with String attributes called someAttribute, example 4 vertices with different "someAttribute" values:
someAttribute = "aaa"
someAttribute = "bb1"
someAttribute = "c"
someAttribute = "d"
I sorted it using sort() by "someAttribute" and I need to query for vertices with attribute greater than "c" (so I expect vertices with attribute "d" in results), but it looks like gt predicate expects a number. How can I achieve it with String attribute?
traversal.order().by("someAttribute")
.has("someAttribute", gt("c"))
.range(0, 2);
In general, using gt etc. on text values should work. I have tested using TinkerGraph, Amazon Neptune, and JanusGraph.
The tests below were conducted using a JanusGraph 0.6 snapshot that I have on my machine (it was made just before 0.6 was GA). I used the inmemory graph, with full graph scans enabled.
gremlin> graph=JanusGraphFactory.open('conf/mem.properties')
==>standardjanusgraph[inmemory:[127.0.0.1]]
gremlin> g=graph.traversal()
==>graphtraversalsource[standardjanusgraph[inmemory:[127.0.0.1]], standard]
gremlin> g.addV('test').property('text',"abc")
==>v[4168]
gremlin> g.addV('test').property('text',"def")
==>v[4104]
gremlin> g.addV('test').property('text',"hij")
==>v[4112]
gremlin> g.addV('test').property('text',"xyz")
==>v[8264]
gremlin> g.tx().commit()
==>null
gremlin> g.V()
09:23:24 WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [()]. For better performance, use indexes
==>v[4104]
==>v[4112]
==>v[4168]
==>v[8264]
gremlin> g.V().has('text',gt('d'))
09:23:39 WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [(text > d)]. For better performance, use indexes
==>v[4104]
==>v[4112]
==>v[8264]
gremlin> g.V().has('text',gt('d')).values('text')
09:23:48 WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [(text > d)]. For better performance, use indexes
==>def
==>hij
==>xyz
I want next:
Traverse part of graph
Take property from first traverse
Put it into other traversal as filter
Get filtered value
When I run next in Gremlin console:
g = TinkerGraph.open().traversal()
g.addV('a').property(id, 1).property('b',2)
g.addV('a').property(id, 2).property('b',2).property('c',3)
g.V(2).properties().key().limit(1).as('q').select('q')
g.V(2).properties().key().limit(1).as('q').V(1).properties().key()
g.V(2).properties().key().limit(1).as('q').V(1).properties().key().select('q')
g.V(2).properties().key().limit(1).as('q').V(1).properties().key().where(__.is('b'))
g.V(2).properties().key().limit(1).as('q').V(1).properties().key().where(__.is(select('q')))
I get:
gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV('a').property(id, 1).property('b',2)
==>v[1]
gremlin> g.addV('a').property(id, 2).property('b',2).property('c',3)
==>v[2]
gremlin> g.V(2).properties().key().limit(1).as('q').select('q')
==>b
gremlin> g.V(2).properties().key().limit(1).as('q').V(1).properties().key()
==>b
gremlin> g.V(2).properties().key().limit(1).as('q').V(1).properties().key().select('q')
==>b
gremlin> g.V(2).properties().key().limit(1).as('q').V(1).properties().key().where(__.is('b'))
==>b
gremlin> g.V(2).properties().key().limit(1).as('q').V(1).properties().key().where(__.is(select('q')))
gremlin>
So I can see that:
My first traverse path gets property of 'b'
Selecting by direct usage of literal 'b' works
Using projection to filter by 'b' does not works.
So question is - how to use value from one part of traversal as filter of other traversal in case described above?
My use case is that I have prototype vertex. I want to grapb all its properties(and may be values), and find all vertices which are similar to that prototype.
Other alternative is to store query inside property of prototype, read it and evaluate it to get vertices which are filtered by it.
I know that I can do application side join of strings, but I want to stay only in code less part of Gremlin to have proper provider portability.
UPDATE:
Example from official documentation:
gremlin> firstYear = g.V().hasLabel('person').
local(properties('location').values('startTime').min()).
max().next()
==>2004
gremlin> l = g.V().hasLabel('person').as('person').
properties('location').or(has('endTime',gt(firstYear)),hasNot('endTime')).as('location').
valueMap().as('times').
select('person','location','times').by('name').by(value).by().toList()
How can I use firstYear without having variables in console, but to reference that from query?
I see your question was answered on the Gremlin Users list. [1] Copying the answer here for others that may search for the same question.
What you're looking for is:
g.V(2).properties().key().limit(1).as('q').V(1).properties().key().where(eq('q'))
See documentation for the Where Step to learn about the different usage patterns of where.
[1] https://groups.google.com/forum/#!topic/gremlin-users/f1NfwUw9ZVI
In some cases, I get inexplicable result when I use order().by(...) with coalesce(...).
Using the standard Modern graph,
gremlin> g.V()
.hasLabel("person")
.out("created")
.coalesce(values("name"), constant("x"))
.fold()
==>[lop,lop,ripple,lop]
But if I sort by name before the coalesce I get 9 lop instead of 3:
gremlin> g.V()
.hasLabel("person")
.out("created")
.order().by("name")
.coalesce(values("name"), constant("x"))
.fold()
==>[lop,lop,lop,lop,lop,lop,lop,lop,lop,ripple]
Why the number of elements differs between the two queries ?
That looks like a bug - I've created an issue in JIRA. There is a workaround but first consider that your traversal isn't really going to work even with the bug set aside, order() will fail because you're referencing a key that possibly doesn't exist in the by() modulator. So you need to account for that differently:
g.V().
hasLabel("person").
out("created").
order().by(coalesce(values('name'),constant('x')))
I then used choose() to do what coalesce() is supposed to do:
g.V().
hasLabel("person").
out("created").
order().by(coalesce(values('name'),constant('x'))).
choose(has("name"),values('name'),constant('x')).
fold()
and that seems to work fine.
Starting with the toy graph I can find which vertexes are creators by looking for edges that have 'created' out edges:
gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
graph.traversal().V().as('a').out('created').select('a').values('name')
==>marko
==>josh
==>josh
==>peter
I can filter out the duplicates with the dedup step...
gremlin> graph.traversal().V().as('a').out('created').select('a').dedup().values('name')
==>marko
==>josh
==>peter
...but this only alters the output, not the path followed by the Gremlin. If creators can be supernodes I'd like to tell the query to output 'a' once it finds its first 'created' edge and to then stop traversing the out step for the current 'a' and proceed to the next 'a'. Can this be done?
This syntax has the desired output. Do they behave like I intend?
graph.traversal().V().where(out('created').count().is(gt(0))).values('name')
graph.traversal().V().where(out('created').limit(1).count().is(gt(0))).values('name')
Is there a better recipe?
EDIT: I just found an example in the where doc (example 2) that shows the presence of a link being evaluated as truth (may not be wording this correctly):
graph.traversal().V().where(out('created')).values('name')
There's a warning about the star-graph problem, which I think doesn't apply here because, and I'm guessing, there is only one where step that tests a branch?
Your last example is the way to go.
g.V().where(out('created')).values('name')
Strategies will optimize that for you and turn it into:
g.V().where(outE('created')).values('name')
Also, .where(outE('created')) will not iterate through all the out-edges, it's just like a .hasNext(), hence no supernode problem.
total n00b here when it comes to Gremlin, but would someone mind telling me the difference between a "transform" and "sideEffect"? I read the Gremlin Docs and both seem to "at times" take inputs, massage the data, and produce outputs.
I thought I was on to something when it looked like "transforms" were mostly "getters", but then why would it be called "transform"? And then I saw that they put functions under transform that did much more than just get data.
Any help is appreciated, thanks in advance!
A transform behaves as a map function, where the value in the Gremlin Pipeline is literally tranformed to a different value:
gremlin> g.V.transform{it.name}
==>marko
==>vadas
==>lop
==>josh
==>ripple
==>peter
Note that in the example above, we've gone from a Vertex before the transform step to the string value of the name property on that Vertex. Now watch what happens with sideEffect:
gremlin> g.V.sideEffect{it.name}
==>v[1]
==>v[2]
==>v[3]
==>v[4]
==>v[5]
==>v[6]
The Vertex passes through sideEffect unchanged (i.e. it exits the sideEffect as a Vertex, just as it went in). The point of a sideEffect is that you can do something at that step of the pipeline that does "something" unrelated to the processing of the pipeline itself. In other words, the return value of the sideEffect closure is basically ignored.