Gremlin `elementMap() step` returns less elements than actually present - graph

I have an application with more than 3000 vertices having the same label , let's say ABC. It is required for my application to get the list of all the vertices and their properties for the user to choose the entity and interact with it. For that I am writing a GetAllVertices query for label ABC.
The id's of the vertices are numbers
Ex: 1,2,3,..
The following query returns the correct amount of vertices ~ 3000
g.V().hasLabel('ABC').dedup().count()
The following query however only returns around 1600 entries
g.V().hasLabel('ABC').elementMap()
I am trying to understand what is happening and how can I get the elementMap for all the vertices that I am interested in. I think it might be because of the hash function elementMap() might be using that is causing the collision of the keys and thus resulting in overwriting some of the keys with different entries.

Using TinkerGraph I am not able to reproduce this behavior.
gremlin> g.inject(0).repeat(addV('ABC').property(id,loops())).times(3000)
==>v[2999]
gremlin> g.V().hasLabel('ABC').count()
==>3000
gremlin> g.V().hasLabel('ABC').elementMap().count()
==>3000
If you can say more about the data in your graph I can do some additional tests and try to reproduce what you are seeing.
UPDATED 2022-08-03
I ran the same test on Amazon Neptune version 1.1.1.0.R4 from a Neptune notebook, and it worked there as well.
%%gremlin
g.inject(0).repeat(addV('ABC').property('p1',loops())).times(3000)
v[a6c131cc-42e8-3713-c82d-faa193b118a0]
%%gremlin
g.V().hasLabel('ABC').count()
3000
%%gremlin
g.V().hasLabel('ABC').elementMap().count()
3000

Related

Azure cosmosDB gremlin - how to update vertex property with another vertex's property

On an Azure cosmosDB gremlin instance,
I have 2 vertices A and B linked by and edge E.
Both vertices has a 'name' property.
I'd like to run a query which will take A's name and put it in B
when I run
g.V("AId").as("a").oute().inv().hasLabel('B').property("name",select('a').values('name'))
I get the following error :
GraphRuntimeException ExceptionMessage : Gremlin Query Execution Error: Cannot create ValueField on non-primitive type GraphTraversal.
It looks like the select operator is not correctly used.
Thank you for your help
EDITED based on discussion in comments
You have oute and inv in lower case. In general, the steps use camelCase naming, such as outE and inV (outside of specific GLVs), but in the comments it was mentioned that CosmosDB will accept all lower case step names. Assuming therefore, that is not the issue here, the query as written looks fine in terms of generic Gremlin. The example below was run using TinkerGraph, and uses the same select mechanism to pick the property value.
gremlin> g.V(3).as("a").outE().inV().has('code','LHR').property("name",select('a').values('city'))
==>v[49]
gremlin> g.V(49).values('name')
==>Austin
What you are observing may be specific to CosmosDB and it's probably worth contacting their support folks to double check.

The provided traversal does not map to a value

I am trying to execute a math query.
gts.V()
.hasLabel("account")
.has("id",42)
.both("account_label1").as("label1")
.and(__.identity()
.project("a","b")
.by(__.identity()
.both("label1_label2")
.both("label2_label3")
.values("createTime"))
.by(__.identity()
.both("label1_label4")
.both("label4_label5")
.values("launchTime"))
.math("floor((a-b)/(86400))").is(100))
.select("label1")
.toList()
Above query fails with error
The provided traverser does not map to a value: v[137]->[IdentityStep, VertexStep(BOTH,[label1_label2],vertex), VertexStep(BOTH,[label2_label3],vertex), NoOpBarrierStep(2500), PropertiesStep([createTime],value)]
Why is gremlin injection NoOpBarrierStep?
What is the meaning of the NoOpBarrierStep(2500)?
What will be the correct gremlin query for the same?
When you use project() it expects a value for each by() modulator and that value should not produce an empty Iterator. Here's a simple example:
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().project('x').by(out())
==>[x:v[3]]
The provided traverser does not map to a value: v[2]->[VertexStep(OUT,vertex)]
Type ':help' or ':h' for help.
Display stack trace? [yN]
The first vertex is able to traverse out() but the next one processed by project() has no outgoing edges and therefore produces this error. In your case, that simply means that not all of your traversers can traverse both().both() or if they can, you would want to be sure that they all had "createTime" property values. Either of those scenarios could cause the problem.
You could fix this in a variety of ways. Obviously, if it's a data problem you could simply fix your data and always assume that the traversal path is right. If that's not the case, you need to write your Gremlin to be a bit more forgiving if the traversal path is not available. In my case I could do:
gremlin> g.V().project('x').by(out().fold())
==>[x:[v[3],v[2],v[4]]]
==>[x:[]]
==>[x:[]]
==>[x:[v[5],v[3]]]
==>[x:[]]
==>[x:[v[3]]]
Perhaps in your case you might do:
by(coalesce(both("label1_label2").both("label2_label3").values("createTime"),
constant('n/a')))
Note that you do not need to specify identity() for the start of your anonymous traversals.
Finally, in answer to your questions about NoOpBarrierStep, that step is injected into traversals where Gremlin thinks it can take advantage of a bulking optimization. You can add them yourself with barrier() step as well. Here's a quick description of "bulking" as taken from the TinkerPop Reference Documentation:
The theory behind a "bulking optimization" is simple. If there are one million traversers at vertex 1, then there is no need to calculate one million both()-computations. Instead, represent those one million traversers as a single traverser with a Traverser.bulk() equal to one million and execute both() once.

Cannot access specific vertex by ID, using TinkerGraph in Gremlin Console

I cannot select a specific vertex, by executing g.V(3640).valueMap(true).unfold(). Any command which contains an ID between the parentheses in the g.V() command does not seem to work.
This is what I did:
I'm new to Graph databases and experimenting with the Gremlin console. I started by creating an instance:
graph = TinkerGraph.open()
g=graph.traversal()
and loading sample data by importing a .graphml database file:
g.io(graphml()).readGraph('/full/path/to/air-routes-latest.graphml')
which seemed to work fine because a count gives a nice result back
gremlin> g.V().count()
==>3642
Unfortunately the following does not work:
gremlin> g.V(3640).valueMap(true).unfold()
Which I think is odd, because by executing the following
gremlin> g.V()
==>v[3640]
==>v[2306]
...
the ID does seem to exist. Any ideas why I cannot access a specific ID? I tried different commands but g.V() seems to work fine, and g.V(3640) does not. Is it because I use TinkerGraph instead of a Gremlin database, of what might be the problem?
EDIT:
It seems that my id's were saved as strings, because g.V("2").valueMap(true).unfold() does give me results.
I think you likely have an issue with the "type" of the identifier. I suspect that if you do:
g.V(3640L)
that you will get the vertex you want. By default, TinkerGraph handles id equality with equals() so if you try to find an integer when the id is a long it will act like it's not there. You can modify that default if you like with an IdManager configuration discussed here. Note that this is also discussed in more detail in Practical Gremlin.

Simple outer-join like gremlin query not returning any results

I wrote the simple query below to traversal between Person to Country but it’s not returning any results.
g.V().hasLabel("Person").as("p").out("from").hasLabel("Country").as("c").select("p", "c")
In the actual data, only Person vertices exists and no Country vertices or from edges exist. I expected to at least return p - basically I want to do a left outer join. However, if I have Country and from data as well, the query returns results.
I tried another query using match as well but still no results unless there are actual data:
g.V().hasLabel("Person").has("name","bob").match(__.as("p").out("from").hasLabel("Country").as("c")).select("p", "c")
I'm running these queries against Datastax Enterpise Graph.
Any idea why it’s returning no results?
The result you are getting is expected. If there are no "from" edges then the traverser essentially dies and does not proceed any further. Perhaps you could consider using project():
g.V().hasLabel("Person").
project('name','country').
by('name')
by(out('from').hasLabel('Country').values('name').fold())
With the "modern" toy graph in TinkerPop, the output looks like this:
gremlin> g.V().hasLabel('person').project('name','knows').by().by(out('knows').values('name').fold())
==>[name:v[1],knows:[vadas,josh]]
==>[name:v[2],knows:[]]
==>[name:v[4],knows:[]]
==>[name:v[6],knows:[]]
In the future, when you submit questions about Gremlin, please include a Gremlin script that can be pasted into a Gremlin Console which makes it easier to try to more directly answer your specific question.

Key index on Bulbs / Gremlin / Titan

I'm trying to port my Neo4J application to Titan and I'm having some issues related to indexes.
I understand that Titan does not support vertex or edge indexes, only "key" indexes, is it right?
I'm also working with Bulbs models, for example:
class Person(Node):
element_type = 'person'
facebook_id = String(indexed=True)
It should be possible when adding Person(facebook_id='111') to retrieve using:
gremlin> g.getVertices('facebook_id', '111')
It doesn't work and tells me that I need to create the key index before using it. So I dropped the keyspace and manually created the index in rexster doghouse:
gremlin> g.createKeyIndex("facebook_id", Vertex.class);
After that, created Person(facebook_id='111') with Bulbs and tried to retrieve on rexster doghouse:
gremlin> g.getVertices("facebook_id", "111")
And got empty response. When fetching using Titan vertex ID it works, but "facebook_id" comes empty and ".map()" doesn't work:
gremlin> g.v(4)
==>v[4]
gremlin> g.v(4).name
==>Renato Garcia Pedigoni
gremlin> g.v(4).facebook_id # nothing returned!
gremlin> g.v(4).map()
==>javax.script.ScriptException: java.lang.IllegalArgumentException: The value is already used by another vertex and the key is unique
PS
It's the first vertex I created after dropping the keyspace
Is it possible create keys indexes automatically?
Any tips?
Thanks!
Renato Pedigoni
Yes, Titan only supports key indexes which replace the old manual vertex indexes with similar functionality but less overhead.
The exception indicates that the property is not only indexed but also unique (see Titan Types for more information).
Have you tried adding the vertex and key index in Gremlin (i.e. without Bulbs)?
Also, James has done a lot of work on Bulbs with respect to Titan integration, so this particular issue might be resolved in the most current version.

Resources