Key index on Bulbs / Gremlin / Titan - graph

I'm trying to port my Neo4J application to Titan and I'm having some issues related to indexes.
I understand that Titan does not support vertex or edge indexes, only "key" indexes, is it right?
I'm also working with Bulbs models, for example:
class Person(Node):
element_type = 'person'
facebook_id = String(indexed=True)
It should be possible when adding Person(facebook_id='111') to retrieve using:
gremlin> g.getVertices('facebook_id', '111')
It doesn't work and tells me that I need to create the key index before using it. So I dropped the keyspace and manually created the index in rexster doghouse:
gremlin> g.createKeyIndex("facebook_id", Vertex.class);
After that, created Person(facebook_id='111') with Bulbs and tried to retrieve on rexster doghouse:
gremlin> g.getVertices("facebook_id", "111")
And got empty response. When fetching using Titan vertex ID it works, but "facebook_id" comes empty and ".map()" doesn't work:
gremlin> g.v(4)
==>v[4]
gremlin> g.v(4).name
==>Renato Garcia Pedigoni
gremlin> g.v(4).facebook_id # nothing returned!
gremlin> g.v(4).map()
==>javax.script.ScriptException: java.lang.IllegalArgumentException: The value is already used by another vertex and the key is unique
PS
It's the first vertex I created after dropping the keyspace
Is it possible create keys indexes automatically?
Any tips?
Thanks!
Renato Pedigoni

Yes, Titan only supports key indexes which replace the old manual vertex indexes with similar functionality but less overhead.
The exception indicates that the property is not only indexed but also unique (see Titan Types for more information).
Have you tried adding the vertex and key index in Gremlin (i.e. without Bulbs)?
Also, James has done a lot of work on Bulbs with respect to Titan integration, so this particular issue might be resolved in the most current version.

Related

Gremlin `elementMap() step` returns less elements than actually present

I have an application with more than 3000 vertices having the same label , let's say ABC. It is required for my application to get the list of all the vertices and their properties for the user to choose the entity and interact with it. For that I am writing a GetAllVertices query for label ABC.
The id's of the vertices are numbers
Ex: 1,2,3,..
The following query returns the correct amount of vertices ~ 3000
g.V().hasLabel('ABC').dedup().count()
The following query however only returns around 1600 entries
g.V().hasLabel('ABC').elementMap()
I am trying to understand what is happening and how can I get the elementMap for all the vertices that I am interested in. I think it might be because of the hash function elementMap() might be using that is causing the collision of the keys and thus resulting in overwriting some of the keys with different entries.
Using TinkerGraph I am not able to reproduce this behavior.
gremlin> g.inject(0).repeat(addV('ABC').property(id,loops())).times(3000)
==>v[2999]
gremlin> g.V().hasLabel('ABC').count()
==>3000
gremlin> g.V().hasLabel('ABC').elementMap().count()
==>3000
If you can say more about the data in your graph I can do some additional tests and try to reproduce what you are seeing.
UPDATED 2022-08-03
I ran the same test on Amazon Neptune version 1.1.1.0.R4 from a Neptune notebook, and it worked there as well.
%%gremlin
g.inject(0).repeat(addV('ABC').property('p1',loops())).times(3000)
v[a6c131cc-42e8-3713-c82d-faa193b118a0]
%%gremlin
g.V().hasLabel('ABC').count()
3000
%%gremlin
g.V().hasLabel('ABC').elementMap().count()
3000

Cannot access specific vertex by ID, using TinkerGraph in Gremlin Console

I cannot select a specific vertex, by executing g.V(3640).valueMap(true).unfold(). Any command which contains an ID between the parentheses in the g.V() command does not seem to work.
This is what I did:
I'm new to Graph databases and experimenting with the Gremlin console. I started by creating an instance:
graph = TinkerGraph.open()
g=graph.traversal()
and loading sample data by importing a .graphml database file:
g.io(graphml()).readGraph('/full/path/to/air-routes-latest.graphml')
which seemed to work fine because a count gives a nice result back
gremlin> g.V().count()
==>3642
Unfortunately the following does not work:
gremlin> g.V(3640).valueMap(true).unfold()
Which I think is odd, because by executing the following
gremlin> g.V()
==>v[3640]
==>v[2306]
...
the ID does seem to exist. Any ideas why I cannot access a specific ID? I tried different commands but g.V() seems to work fine, and g.V(3640) does not. Is it because I use TinkerGraph instead of a Gremlin database, of what might be the problem?
EDIT:
It seems that my id's were saved as strings, because g.V("2").valueMap(true).unfold() does give me results.
I think you likely have an issue with the "type" of the identifier. I suspect that if you do:
g.V(3640L)
that you will get the vertex you want. By default, TinkerGraph handles id equality with equals() so if you try to find an integer when the id is a long it will act like it's not there. You can modify that default if you like with an IdManager configuration discussed here. Note that this is also discussed in more detail in Practical Gremlin.

How to use a UUID as id in Gremlin?

I'm adding verticles like this:
g.addV("foobar").property("id", 1).property(...etc...
How can I set a property with a uuid instead of an integer id?
An "id" can have multiple meanings. If you simply mean that you want to use a UUID as a unique identifier to lookup your vertices then taking the approach you have is fine, when used in conjunction with the underlying indexing functionality of your chosen graph database. In other words, as long as you have an index on "id" then you will quickly find your vertex. In this sort of usage, "id" is really just a property of the vertex obviously and you may find that for certain graph databases that "id" is actually a reserved term and can't be used as a property key. It is likely best to choose a different key name.
If instead of using "id" as a property key, you mean that you wish to set the actual vertex identifier, referred to by T.id, as in:
g.addV(T.id, uuid)
then you first need to use a graph database implementation that allows the assignment of identifiers. TinkerGraph is one such implementation. In this way, you natively assign the identifier of the vertex rather than allowing the graph database to create it for you.
gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV(id, UUID.randomUUID())
==>v[c2d673de-2425-4b42-bc1e-68ff20e3b0a8]
gremlin> g.V(UUID.fromString("c2d673de-2425-4b42-bc1e-68ff20e3b0a8"))
==>v[c2d673de-2425-4b42-bc1e-68ff20e3b0a8]

Check successful loading of data into cassandra by gremlin

I have setup my cassandra and titan running. And gremlin also works fine. I have connected gremlin to cassandra using,
gremlin>conf=new BaseConfiguration();
gremlin>conf.setProperty('storage.backend','cassandra');
gremlin>conf.setProperty('storage.hostname', '192.168.14.129');
gremlin>conf.setProperty('storage.keyspace','test');
gremlin>g=TitanFactory.open(conf);
And i have created a vertex,
gremlin> v1 = g.addVertex(label,"person","f_name","Anna");
==>v[8424]
How do i check if this data is entered into cassandra in test keyspace(already in cassandra)?
TinkerPop v3.x distinguishes between a Graph and a TraversalSource.
You should be doing the following only once:
graph = TitanFactory.open(conf)
g = graph.traversal()
Then execute all your traversals with:
g.V().some(...).gremlin(...).steps(...)
To find a Vertex by its id in Titan, you may have to cast the id to a Long. Assuming a Vertex with id 8424l, you can do:
g.V(8424l) // returns a traversal
g.V(8424l).next() // returns that vertex
You shouldn't be calling graph.traversal() more than once, as you get a performance hit every time. In the default Titan v1.0.0 setup, notice how the traversal initialization is done when starting Gremlin server (see conf/gremlin-server/gremlin-server.yaml which executes the scripts/empty-sample.groovy file).
According to your comment underneath here you want to retrieve the just added vertex. Please refer to the Tinkerpop reference. Using your g as notation for the graph it's simple as
gremlin> g.traversal().V(8424)
For getting the properties of the vertex read valueMap step in the reference. For obtaining the vertex not by its Id, but by its properties read has step. Be aware that I linked to the reference for Tinkerpop 3.2.0. You might chose a different version of that document to meet your very version of the stack.

Tinkerpop - how do I find a node in the graph?

I can't seem to find a specific node in the graph without traversing the whole thing. Is there something I'm missing?
I'm using tinkerpop blueprints.
Orientdb gives some sort of unsemantic id to a node such as '#8:1' - how do I find this without knowing the id? vertex has a property like 'user=jason' that will identify it.
I'm thinking I'll just use redis to store the user/location pair or otherwise use a supernode (no thanks)
Blueprints has the notion of key indices.
https://github.com/tinkerpop/blueprints/wiki/Graph-Indices
Given your example, define a key index for "user", then query it with the key index. Here's an example using OrientDB from a Gremlin prompt:
gremlin> g = new OrientGraph("memory://graph")
==>orientgraph[memory://graph]
gremlin> g.createKeyIndex("user", Vertex.class)
==>null
gremlin> g.addVertex([user:"Jason"])
==>v[#8:-3]
gremlin> g.addVertex([user:"Rick"])
==>v[#8:-4]
gremlin> g.stopTransaction(SUCCESS)
==>null
gremlin> g.V('user','Jason')
==>v[#8:1]

Resources