I am trying to execute a math query.
gts.V()
.hasLabel("account")
.has("id",42)
.both("account_label1").as("label1")
.and(__.identity()
.project("a","b")
.by(__.identity()
.both("label1_label2")
.both("label2_label3")
.values("createTime"))
.by(__.identity()
.both("label1_label4")
.both("label4_label5")
.values("launchTime"))
.math("floor((a-b)/(86400))").is(100))
.select("label1")
.toList()
Above query fails with error
The provided traverser does not map to a value: v[137]->[IdentityStep, VertexStep(BOTH,[label1_label2],vertex), VertexStep(BOTH,[label2_label3],vertex), NoOpBarrierStep(2500), PropertiesStep([createTime],value)]
Why is gremlin injection NoOpBarrierStep?
What is the meaning of the NoOpBarrierStep(2500)?
What will be the correct gremlin query for the same?
When you use project() it expects a value for each by() modulator and that value should not produce an empty Iterator. Here's a simple example:
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().project('x').by(out())
==>[x:v[3]]
The provided traverser does not map to a value: v[2]->[VertexStep(OUT,vertex)]
Type ':help' or ':h' for help.
Display stack trace? [yN]
The first vertex is able to traverse out() but the next one processed by project() has no outgoing edges and therefore produces this error. In your case, that simply means that not all of your traversers can traverse both().both() or if they can, you would want to be sure that they all had "createTime" property values. Either of those scenarios could cause the problem.
You could fix this in a variety of ways. Obviously, if it's a data problem you could simply fix your data and always assume that the traversal path is right. If that's not the case, you need to write your Gremlin to be a bit more forgiving if the traversal path is not available. In my case I could do:
gremlin> g.V().project('x').by(out().fold())
==>[x:[v[3],v[2],v[4]]]
==>[x:[]]
==>[x:[]]
==>[x:[v[5],v[3]]]
==>[x:[]]
==>[x:[v[3]]]
Perhaps in your case you might do:
by(coalesce(both("label1_label2").both("label2_label3").values("createTime"),
constant('n/a')))
Note that you do not need to specify identity() for the start of your anonymous traversals.
Finally, in answer to your questions about NoOpBarrierStep, that step is injected into traversals where Gremlin thinks it can take advantage of a bulking optimization. You can add them yourself with barrier() step as well. Here's a quick description of "bulking" as taken from the TinkerPop Reference Documentation:
The theory behind a "bulking optimization" is simple. If there are one million traversers at vertex 1, then there is no need to calculate one million both()-computations. Instead, represent those one million traversers as a single traverser with a Traverser.bulk() equal to one million and execute both() once.
Related
I have a graph, that represents database objects, parent-child relations and dataflows relations (only in-between columns).
Here is my current gremlin query (in python), that should find dataflow impact of a column:
g.V().has('fqn', 'some fully qualified name').
repeat(outE("flows_into").dedup().store('edges').inV()).
until(
or_(
outE("flows_into").count().is_(eq(0)),
cyclicPath(),
)
).
cap('edges').
unfold().
dedup().
map(lambda: "g.V(it.get().getVertex(0).id()).in('child').in('child').id().next().toString() + ',' + g.V(it.get().getVertex(1).id()).in('child').in('child').id().next().toString()").
toList()
This query should return all edges, that are somehow impacted by the initial column.
The problem is, that in some cases, I do not care about the column-level stuff and I want to get the edges on 'schema level'. That is wjat the lambda does - for both nodes in the edge, it traverses two times up in the objects tree, which returns the schema node.
The problem is in this lambda function - I cannot just do this:
it.get().getVertex(1).in('child').in('child').id().next().toString()
because getVertex(1) does not return a traversable instance. So I need to start new traversal by g.V().... By my debugging, this line causes the horrible slowdown. It gets about 50x slower if I leave this transformation in.
Do you have any ideas how to optimize this query?
You might consider not using a lambda at all, given they tend to not be portable between implementations. Perhaps the map step could be replaced with a project step something like:
project('v0','v1').
by(outV().in('child').in('child').id())
by(inV().in('child').in('child').id())
I try to get properties which has key or id in following query by Gremlin.Net, but vertex info(id and label) in VertexProperty is null in result.
g.V().Properties<VertexProperty>().HasKey(somekey).Promise(p => p.ToList())
So i try another way, but it's return class is Path, and i had to write an ugly code for type conversion.
g.V().Properties<VertexProperty>().HasKey(somekey).Path().By(__.ValueMap<object, object>(true))
Is there a better way to achieve this requirement
I think basically the only thing missing to get what you want is the Project() step.
In order to find all vertices that have a certain property key and then get their id, label, and then all information about that property, you can use this traversal:
g.V().
Has(someKey).
Project<object>("vertexId", "vertexLabel", "property").
By(T.Id).
By(T.Label).
By(__.Properties<object>(someKey).ElementMap<object>()).
Promise(t => t.ToList());
This returns a Dictionary where the keys are the arguments given to the Project step.
If you instead want to filter by a certain property id instead of a property key, then you can do it in a very similar way:
g.V().
Where(__.Properties<object>().HasId(propertyId)).
Project<object>("vertexId", "vertexLabel", "property").
By(T.Id).
By(T.Label).
By(__.Properties<object>(someKey).ElementMap<object>()).
Promise(t => t.ToList());
This filters in both cases first the vertices to only have vertices that have the properties we are looking for. That way, we can use the Project() step afterwards to get the desired data back.
ElementMap should give all information back about the properties that you want.
Note however that these traversals will most likely require a full graph scan in JanusGraph, meaning that it has to iterate over all vertices in your graph. The reason is that these traversals cannot use an index which would make them much more efficient. So, for larger graphs, the traversals will probably not be feasible.
If you had the vertex ids available instead of the property ids in the second traversal, then you could make the traversal a lot more efficient by replacing g.V().Where([...]) simply with g.V(id).
I have a traversal that ends with a drop() to delete a vertex. I would like to be able to tell the difference between the drop() removing a vertex and the traversal just not matching anything.
I tried adding an alias to one of the earlier nodes and select()ing it at the end of the traversal, but that doesn't return anything even when the traversal does match the graph.
e.g.
g.V('id', '1').as('flag')
.out('has_child')
.drop()
.select('flag')
.toList()
The trick is that drop() is a filter step so it removes objects from the traversal stream. You can work around that situation a bit by dropping by sideEffect():
gremlin> g.V().has('person','name','marko')
==>v[1]
gremlin> g.V().has('person','name','marko').sideEffect(drop())
==>v[1]
gremlin> g.V().has('person','name','marko')
gremlin>
The return of the vertex would mean that it was present and dropped, but if no value is returned then it wasn't present in the first place to be dropped.
I need to write a single Gremlin query that can set the new property values of a vertex. All the property names are known in advance (in this example: Type, Country, Status). Some of the property values can be null - and I don't know which ones in advance. The query should work for all cases. For example, let's say I currently have this query:
g.V(123).
property('Type',Type).
property('Country',Country).
property('Status',Status)
This query works fine if all the parameter (Type, Country, Status) values are non-null. If, say, Country is null, I get an error:
The AddPropertyStep does not have a provided value: AddPropertyStep({key=[Country]})
In such case I would need to use a different query to drop the property (by the way, is there a better way for dropping a property?):
g.V(123).
property('Type',Type).
property('Status',Status).
properties('Country').drop()
Is it possible to write a universal query that can handle both null and non-null values? I cannot use console or programming, just a single Gremlin query to be executed.
Thanks!
TinkerPop doesn't allow null values in properties (though you might find some graph databases allowing different semantics there, I suppose), so you should validate your data up front to ensure that it has some meaningful "empty value" as opposed to a null. If you can't do that for some reason, I guess you could use choose() step to "check for null":
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().has('person','name','marko').valueMap()
==>[name:[marko],age:[29]]
gremlin> age = null
gremlin> g.V().has('person','name','marko').choose(constant(age).count().is(0), properties('age').drop(),property('age',age))
gremlin> g.V().has('person','name','marko').valueMap()
==>[name:[marko]]
gremlin> age = 30
==>30
gremlin> g.V().has('person','name','marko').choose(constant(age).count().is(0), properties('age').drop(),property('age',age))
==>v[1]
gremlin> g.V().has('person','name','marko').valueMap()
==>[name:[marko],age:[30]]
The check for "is null" is basically just: constant(age).count().is(0), which leans on the Gremlin's semantics for null values in a stream being empty and giving a count() of zero. It works, but it makes your Gremlin a little less readable. That might be a nice DSL step to add if you have to write that a lot.
I cannot select a specific vertex, by executing g.V(3640).valueMap(true).unfold(). Any command which contains an ID between the parentheses in the g.V() command does not seem to work.
This is what I did:
I'm new to Graph databases and experimenting with the Gremlin console. I started by creating an instance:
graph = TinkerGraph.open()
g=graph.traversal()
and loading sample data by importing a .graphml database file:
g.io(graphml()).readGraph('/full/path/to/air-routes-latest.graphml')
which seemed to work fine because a count gives a nice result back
gremlin> g.V().count()
==>3642
Unfortunately the following does not work:
gremlin> g.V(3640).valueMap(true).unfold()
Which I think is odd, because by executing the following
gremlin> g.V()
==>v[3640]
==>v[2306]
...
the ID does seem to exist. Any ideas why I cannot access a specific ID? I tried different commands but g.V() seems to work fine, and g.V(3640) does not. Is it because I use TinkerGraph instead of a Gremlin database, of what might be the problem?
EDIT:
It seems that my id's were saved as strings, because g.V("2").valueMap(true).unfold() does give me results.
I think you likely have an issue with the "type" of the identifier. I suspect that if you do:
g.V(3640L)
that you will get the vertex you want. By default, TinkerGraph handles id equality with equals() so if you try to find an integer when the id is a long it will act like it's not there. You can modify that default if you like with an IdManager configuration discussed here. Note that this is also discussed in more detail in Practical Gremlin.