Upsert query for JanusGraph which will issue an update to storage backend only if vertex properties are modified - gremlin

I have the following Gremlin upsert query for a given vertex which does not issue an update to the JanusGraph storage backend if the property values for age or city has not changed.
Vertex marko = (Vertex) traversalSource.V()
.has("person", "name", name)
.fold()
.coalesce(__.unfold(), __.addV("person").property("name", name))
.choose(__.not(__.has("age", age)), __.property("age", age))
.choose(__.not(__.has("city", city)), __.property("city", city))
.next();
I tried to refactor this to make it more generic, with the property changes specified using a map.
Map<String, Object> propsToUpdate = new HashMap<>();
propsToUpdate.put("age", age);
propsToUpdate.put("city", city);
Vertex marko = (Vertex) traversalSource.withSideEffect("propsToUpdate", propsToUpdate)
.V()
.has("person", "name", name)
.fold()
.coalesce(__.unfold(), __.addV("person").property("name", name))
.as("v")
.select("propsToUpdate")
.unfold()
.as("updateProp")
.select("v")
.choose(__.not(__.has(__.select("updateProp").by(Column.keys).toString(),
__.select("updateProp").by(Column.values))),
__.property(__.select("updateProp").by(Column.keys),
__.select("updateProp").by(Column.values)))
.select("v")
.next();
Unfortunately this generic query does issue an update to the storage backend even if none of the property values have changed.
Any clues as to why this difference in behavior between the 2 queries? Any suggestions to fix the generic query to make sure an update is not issued to the storage?
I am using JanusGraph 0.6.2 for these tests.

The condition inside __.has(__.select("updateProp").by(Column.keys).toString())
does not return the key of the updateProp, but rather returns the string representation of the bytecode of this anonymous traversal. This makes this condition always false (and its negation true) and the property is always updated.
I do not believe gremlin allows to retrieve a vertex property using an as() variable, neither in the context of a has() step nor in a where() step.
Possibly, your generalized query will be possible in JanusGraph-1.0 which will use TinkerPop 3.6.1. In this TinkerPop version it is possible to pass a Map to the property() step. This will make it easy to create a local temporary vertex inside the transaction from the propsToUpdate and compare it to the existing vertex. Only, in case of a difference you pass the Map to the property step for the existing vertex, drop the temporary vertex and commit the transaction.

Related

UPSERT an edge with properties with Gremlin

I'm trying to upsert an edge (insert if it does not exist but update if it does) with properties in Gremlin via a single query.
I have this which adds a manages edge with a duration property between two vertices with id of Amy and John.
g.E().where(outV().has('id','Amy')).where(inV().has('id','John')).fold().coalesce(unfold(),g.V('Amy').as('a').V('John').addE('manages').from('a').property('duration','1year')).
The query does do an upsert, but if I change the value of the duration property, remove the duration property, or add other properties, the property changes are not reflected on the edge.
I want to be able to change the value of the duration property without adding a new edge or having to remove the old one.
I'm fairly new to Gremlin so please share any advice which may help.
I'm not sure it matters but the underlying DB is an Azure Cosmos DB.
To always have the property change apply whether the edge exists or is created, you just need to move the property step to after the coalesce:
g.E().
where(outV().has('id', 'Amy')).
where(inV().has('id', 'John')).
fold().
coalesce(
unfold(),
g.V('Amy').as('a').
V('John').
addE('manages').from('a')).
property('duration', '1year')
However, that said, there are a few observations that can be made about the query. Starting with g.E() is likely to be inefficient and using g.V() mid traversal should be avoided, and where necessary just V() used.
If 'John" and 'Amy' are unique ID's you should take advantage of that along these lines:
g.V('Amy').
outE().where(inV().hasId('John')).
fold().
coalesce(
unfold(),
addE('manages').from(V('Amy')).to(V('John'))).
property('duration', '1year')
Two additions to Kevin's answer:
I think you need to specify the edge label in outE, otherwise the unfold may return other relationships present between the two vertices, and these will be updated with the property, rather than triggering the addE of the new edge. Specifying the edge label should ensure the length of the folded array is 0 in the case of INSERT and 1 in the case of UPDATE.
For Cosmos DB Gremlin API, use the anonymous class __ when not using g
g.V('Amy').
outE('manages').where(inV().hasId('John')).
fold().
coalesce(
unfold(),
__.addE('manages').from(__.V('Amy')).to(__.V('John'))).
property('duration', '1year')
More detailed example here.

Create Vertex only if "from" and "to" vertex exists

I want to create 1000+ Edges in a single query.
Currently, I am using the AWS Neptune database and gremlin.net for creating it.
The issue I am facing is related to the speed. It took huge time because of HTTP requests.
So I am planning to combine all of my queries in a string and executing in a single shot.
_g.AddE("allow").From(_g.V().HasLabel('person').Has('name', 'name1')).To(_g.V().HasLabel('phone').Where(__.Out().Has('sensor', 'nfc'))).Next();
There are chances that the "To" (target) Vertex may not be available in the database. When it is the case this query fails as well. So I had to apply a check if that vertex exists before executing this query using hasNext().
So as of now its working fine, but when I am thinking of combining all 1000+ edge creation at once, is it possible to write a query which doesn't break if "To" (target) Vertex not found?
You should look at using the Element Existence pattern for each vertex as shown in the TinkerPop Recipes.
In your example you would replace this section of your query:
_g.V().HasLabel('person').Has('name', 'name1')
with something like this (I don't have a .NET environment to test the syntax):
__.V().Has('person', 'name', 'name1').Fold().
coalesce(__.Unfold(), __.AddV('person').Property('name', 'name1')
This will act as an Upsert and either return the existing vertex or add a new one with the name property. This same pattern can then be used on your To step to ensure that it exists before the edge is created as well.

Gremlin: property name with static and dynamic

I'm trying create a property of vertice with static and dynamic value using selected properties. Here the code:
g.V('%s').as('source')
.until(or(hasLabel('target').has('v', '1'),loops().is(10)))
.repeat(__.in())
.outE('e').as('edge')
.inV().as('u')
.select('source')
.property(single, 'v', '1')
.property(single, union(constant('p_'),select('u').id()), select('e').properties('r').value())
This query is to copy property of edges as value and id of vertice as name of property with prefix 'p_'. The copy works, but the property name does not works, it's saving just prefix 'p_'.
Any ideas about this behaviour? I'm using tinkerpop 3.4.3, same the Neptune version.
Thanks!
The union() step in this traversal will not return a concatenation of the prefix and the property as you are hoping. Instead, it will return a single traverser for each item in the union(). In this case one containing "p_", one containing the id(), and one containing the "r" property.
Unfortunately, Gremlin does not have a string concatenation function that will accomplish this for you. See this below:
Concatenate Gremlin GraphTraversal result with string
As you are using Neptune the proposed solution in that answer will not work either as Neptune does not support lambdas in a traversal. Unfortunately, in this scenario the best way to accomplish this is likely to return the data to your application, concatenate the strings, and then update the property.

Tinkerpop Gremlin group by key and get latest

I am creating 2 users(uid=1 & uid=2) with 2 versions each.
g.addV('user1').property('uid',1).property('version',1)
.addV('user1').property('uid',1).property('version',2)
.addV('user1').property('uid',2).property('version',1)
.addV('user1').property('uid',2).property('version',2)
I want to get the latest version from each uid, I am using the uid as a groupBy key and getting the latest as shown
g.V().hasLabel('user1')
.group().by('uid').by(fold().order(Scope.local).by('version', Order.desc).unfold().limit(1)) //GraphTraversal<Vertex,Map<Object, Object>>
.flatmap(t -> t.get().values().iterator()) // convert to GraphTraversal<Vertex, Vertex>
//traverse out and get the path
.out('friend').path().by(elementMap())
Is the best approach for this requirement?
What would be the gremlin preferred way to convert the Map to a Vertex inside the flatmap rather than using the lambda? Suppose I want to add further steps after this.
Appreciate any help!
The group step has two modes. Without a label it acts as a barrier but with a label it acts as a side effect. You can have results flow through a group using your data as follows.
gremlin> g.V().group('x').by('uid').by(values('version').max())
==>v[42306]
==>v[42309]
==>v[42312]
==>v[42315]
==>v[42318]
gremlin> g.V().group('x').by('uid').by(values('version').max()).cap('x')
==>[1:2,2:2]
You can add more traversal steps of course before you decide what you want to do with the group. Such as:
g.V().group('x').by('uid').by(values('version').max())out()...

Janusgraph/TinkerPop - Constraint violation - How to add or update an existing Vertex

I have defined in my schema constraints to ensure uniqueness of given vertices based one or more properties. For example:
mgmt.buildIndex('byTenandIdUnique',Vertex.class).addKey(tenantId).unique().buildCompositeIndex()
As expected now, when I try to add a Vertex that already exists, I am getting an error like below:
aiogremlin.exception.GremlinServerError: 500: Adding this property for key [tenantId] and value [ACME2_AX2] violates a uniqueness constraint [byTenandIdUnique]
I am writing a Python application to load log files, with Goblin OGM, so it is expected that the data will repeat, and I don't want multiple instances of the same Vertex, hence the constraint.
Is there a way in TinkerPop or JanusGraph to update a Vertex in case it already exists instead of throwing this exception? Or is this something that the OGM should handle or maybe the code itself by querying the graph before any transaction?
TinkerPop does not do anything to enforce a schema, so the schema restriction here is specific to JanusGraph. The behavior is as you described: if you have a unique index defined and then attempt to add another element that conflicts with an existing element, an exception is thrown.
From a JanusGraph perspective, your logic will need to account for this properly. The code below is based on a common recipe using the coalesce() step that you can read more about here.
// check for existence of a vertex with the tenantId property
// if the vertex exists, return that vertex
// else create a new vertex with the tenantId
v = g.V().property("tenantId", "ACME2_AX2").fold(). \
coalesce( __.unfold(), __.addV().property("tenantId", "ACME2_AX2") ). \
next();
I don't use Goblin, so I'm not aware of whether Goblin is capable of handling this or whether it passes that responsibility on to the app developer, but checking for existence before setting the property is still an appropriate way to handle the situation.

Resources