Janusgraph/TinkerPop - Constraint violation - How to add or update an existing Vertex - gremlin

I have defined in my schema constraints to ensure uniqueness of given vertices based one or more properties. For example:
mgmt.buildIndex('byTenandIdUnique',Vertex.class).addKey(tenantId).unique().buildCompositeIndex()
As expected now, when I try to add a Vertex that already exists, I am getting an error like below:
aiogremlin.exception.GremlinServerError: 500: Adding this property for key [tenantId] and value [ACME2_AX2] violates a uniqueness constraint [byTenandIdUnique]
I am writing a Python application to load log files, with Goblin OGM, so it is expected that the data will repeat, and I don't want multiple instances of the same Vertex, hence the constraint.
Is there a way in TinkerPop or JanusGraph to update a Vertex in case it already exists instead of throwing this exception? Or is this something that the OGM should handle or maybe the code itself by querying the graph before any transaction?

TinkerPop does not do anything to enforce a schema, so the schema restriction here is specific to JanusGraph. The behavior is as you described: if you have a unique index defined and then attempt to add another element that conflicts with an existing element, an exception is thrown.
From a JanusGraph perspective, your logic will need to account for this properly. The code below is based on a common recipe using the coalesce() step that you can read more about here.
// check for existence of a vertex with the tenantId property
// if the vertex exists, return that vertex
// else create a new vertex with the tenantId
v = g.V().property("tenantId", "ACME2_AX2").fold(). \
coalesce( __.unfold(), __.addV().property("tenantId", "ACME2_AX2") ). \
next();
I don't use Goblin, so I'm not aware of whether Goblin is capable of handling this or whether it passes that responsibility on to the app developer, but checking for existence before setting the property is still an appropriate way to handle the situation.

Related

Azure cosmosDB gremlin - how to update vertex property with another vertex's property

On an Azure cosmosDB gremlin instance,
I have 2 vertices A and B linked by and edge E.
Both vertices has a 'name' property.
I'd like to run a query which will take A's name and put it in B
when I run
g.V("AId").as("a").oute().inv().hasLabel('B').property("name",select('a').values('name'))
I get the following error :
GraphRuntimeException ExceptionMessage : Gremlin Query Execution Error: Cannot create ValueField on non-primitive type GraphTraversal.
It looks like the select operator is not correctly used.
Thank you for your help
EDITED based on discussion in comments
You have oute and inv in lower case. In general, the steps use camelCase naming, such as outE and inV (outside of specific GLVs), but in the comments it was mentioned that CosmosDB will accept all lower case step names. Assuming therefore, that is not the issue here, the query as written looks fine in terms of generic Gremlin. The example below was run using TinkerGraph, and uses the same select mechanism to pick the property value.
gremlin> g.V(3).as("a").outE().inV().has('code','LHR').property("name",select('a').values('city'))
==>v[49]
gremlin> g.V(49).values('name')
==>Austin
What you are observing may be specific to CosmosDB and it's probably worth contacting their support folks to double check.

Add Additional Property to Neptune DB

I am trying to add additional property called "insert_date" to the existing vertices and edges. I tried
g.V().setProperty('insert_date',datetime('2020-10-06'))
Error:
{
"requestId": "33cf8df5-3cbe-41ac-b650-5752debec04d",
"code": "MalformedQueryException",
"detailedMessage": "Query parsing failed at line 1, character position at 10, error message : token recognition error at: 'rop'"
}
I am trying the above command from Neptune Notebook.
It just adds new vertices with insert_date property. But I did not find the way to alter existing vertices or edges.
Please suggest if this is possible. As I want to implement delta extraction so that I can extract only new vertices or edges every time I run ETL.
Thanks
To add a property to an existing vertex in Gremlin you use the property() step. For example, if you wanted to add a property insert_date to a vertex with the id of A you would use the following statement:
g.V('A').property('insert_date', '2020-10-06')
The property() step will add or update the specified property to the new value. This will occur for all the current elements being passed in. For example, if you only wanted to update the elements that did not have an insert_date property you could do this via:
g.V().hasNot('insert_date').property('insert_date', '2020-10-06')
In each of these example the property will be added as part of an array of values. If you want to set the property to only contain a single value then you can use the property() step overload that takes the cardinality like this:
g.V('A').property(Cardinality.single, 'insert_date', '2020-10-06')
One thing to note in the code you have listed above. While Neptune does support the datetime() function for string-based queries, if you are not using a GLV then you will need to create this value and pass in a Native Date/Time as described here.
Below Command worked to add additional properties to existing Graph.
g.V().property("insert_date","2020-01-01 00:00:00")

Create Vertex only if "from" and "to" vertex exists

I want to create 1000+ Edges in a single query.
Currently, I am using the AWS Neptune database and gremlin.net for creating it.
The issue I am facing is related to the speed. It took huge time because of HTTP requests.
So I am planning to combine all of my queries in a string and executing in a single shot.
_g.AddE("allow").From(_g.V().HasLabel('person').Has('name', 'name1')).To(_g.V().HasLabel('phone').Where(__.Out().Has('sensor', 'nfc'))).Next();
There are chances that the "To" (target) Vertex may not be available in the database. When it is the case this query fails as well. So I had to apply a check if that vertex exists before executing this query using hasNext().
So as of now its working fine, but when I am thinking of combining all 1000+ edge creation at once, is it possible to write a query which doesn't break if "To" (target) Vertex not found?
You should look at using the Element Existence pattern for each vertex as shown in the TinkerPop Recipes.
In your example you would replace this section of your query:
_g.V().HasLabel('person').Has('name', 'name1')
with something like this (I don't have a .NET environment to test the syntax):
__.V().Has('person', 'name', 'name1').Fold().
coalesce(__.Unfold(), __.AddV('person').Property('name', 'name1')
This will act as an Upsert and either return the existing vertex or add a new one with the name property. This same pattern can then be used on your To step to ensure that it exists before the edge is created as well.

Gremlin code to find 1 vertex with specific property

I want to return a node where the node has a property as a specific uuid and I just want to return one of them (there could be several matches).
g.V().where('application_uuid', eq(application_uuid).next()
Would the above query return all the nodes? How do I just return 1?
I also want to get the property map of this node. How would I do this?
You would just do:
g.V().has('application_uuid', application_uuid).next()
but even better would be the signature that includes the vertex label (if you can):
g.V().has('vlabel', 'application_uuid', application_uuid).next()
Perhaps going a bit further if you explicitly need just one you could:
g.V().has('vlabel', 'application_uuid', application_uuid).limit(1).next()
so that both the graph provider and/or Gremlin Server know your intent is to only next() back one result. In that way, you may save some extra network traffic/processing.
This is a very basic query. You should read more about gremlin. I can suggest Practical Gremlin book.
As for your query, you can use has to filter by property, and limit to get specific number of results:
g.V().has('application_uuid', application_uuid).limit(1).next()
Running your query without the limit will also return a single result since the query result is an iterator. Using toList() will return all results in an array.

What is the model for checking if a GSI key exists or not in DDB?

I have a pretty straight forward question
I want to know if some GSI hash key exists or not.
The best I can find right now is
DynamoDBQueryExpression<T> queryExpression;
// Logic for constructing query
queryExpression.withIndexName(SomeIndexName);
QueryResultPage<T> queryResponse mapper.queryPage(T.class, queryExpression, someMapperConfig));
Here query result page contains a list of results, I can check if that list has anything and conclude whether it exists or not.
The obvious problem is the efficiency drop when there are things that are present. Is there a way to not move the contents of the item across network IO for the purpose of verification (i.e. a server side total validation of the predicate of checking if some GSI key exists or not)?
When writing an item with, for example, Put-Item you can add a condition specifying the key must not exist. This way DynamoDB checks whether the provided key is already taken and will give an error when you try to put something in. Just catch the error and then you know the key was already taken.

Resources