Tinkerpop Gremlin group by key and get latest - gremlin

I am creating 2 users(uid=1 & uid=2) with 2 versions each.
g.addV('user1').property('uid',1).property('version',1)
.addV('user1').property('uid',1).property('version',2)
.addV('user1').property('uid',2).property('version',1)
.addV('user1').property('uid',2).property('version',2)
I want to get the latest version from each uid, I am using the uid as a groupBy key and getting the latest as shown
g.V().hasLabel('user1')
.group().by('uid').by(fold().order(Scope.local).by('version', Order.desc).unfold().limit(1)) //GraphTraversal<Vertex,Map<Object, Object>>
.flatmap(t -> t.get().values().iterator()) // convert to GraphTraversal<Vertex, Vertex>
//traverse out and get the path
.out('friend').path().by(elementMap())
Is the best approach for this requirement?
What would be the gremlin preferred way to convert the Map to a Vertex inside the flatmap rather than using the lambda? Suppose I want to add further steps after this.
Appreciate any help!

The group step has two modes. Without a label it acts as a barrier but with a label it acts as a side effect. You can have results flow through a group using your data as follows.
gremlin> g.V().group('x').by('uid').by(values('version').max())
==>v[42306]
==>v[42309]
==>v[42312]
==>v[42315]
==>v[42318]
gremlin> g.V().group('x').by('uid').by(values('version').max()).cap('x')
==>[1:2,2:2]
You can add more traversal steps of course before you decide what you want to do with the group. Such as:
g.V().group('x').by('uid').by(values('version').max())out()...

Related

Tinkerpop Gremlin is it better to query with hasId or to search by property values

Using Tinkerpop Gremlin (Neptune DB), is there a preferred/"faster" way to query?
For example, let's say I have a graph containing the node:
label: Student
id: 'student/12345'
studentId: '12345'
name: 'Bob'
Is there a preferred query? (for this example let's say we know the field 'studentId' value, which is also part of the id)
g.V().filter('studentId', '12345')
vs
g.V().filter(hasId(TextP.containing('12345'))
or using "has"/"hasId" vs "filter"?
g.V().has('studentId', '12345')
vs
g.V().hasId(TextP.containing('12345'))
So there seems to be two questions here, one about filter() vs has() and the other about using the vertex id versus a property.
The answer to the first question is going to depend on the underlying database implementation and what is has/has not optimized. In general, and in Neptune, I would suggest using the g.V().has('studentId', '12345') pattern to filter on a property as it is optimized and easier to read.
The answer to the second question also depends on the database implementaiton, as not all allow for setting of the vertex ids. Other databases may vary but in Neptune setting ids is allowed and a direct lookup by ID is the fastest (e.g. g.V('12345') or g.V().hasId('12345')) way to look something up as it is a single index lookup. One thing to note is that in Neptune vertex/edge id values need to be globally unique so you need to ensure that you will only have one vertex or edge with a specific id.

Azure cosmosDB gremlin - how to update vertex property with another vertex's property

On an Azure cosmosDB gremlin instance,
I have 2 vertices A and B linked by and edge E.
Both vertices has a 'name' property.
I'd like to run a query which will take A's name and put it in B
when I run
g.V("AId").as("a").oute().inv().hasLabel('B').property("name",select('a').values('name'))
I get the following error :
GraphRuntimeException ExceptionMessage : Gremlin Query Execution Error: Cannot create ValueField on non-primitive type GraphTraversal.
It looks like the select operator is not correctly used.
Thank you for your help
EDITED based on discussion in comments
You have oute and inv in lower case. In general, the steps use camelCase naming, such as outE and inV (outside of specific GLVs), but in the comments it was mentioned that CosmosDB will accept all lower case step names. Assuming therefore, that is not the issue here, the query as written looks fine in terms of generic Gremlin. The example below was run using TinkerGraph, and uses the same select mechanism to pick the property value.
gremlin> g.V(3).as("a").outE().inV().has('code','LHR').property("name",select('a').values('city'))
==>v[49]
gremlin> g.V(49).values('name')
==>Austin
What you are observing may be specific to CosmosDB and it's probably worth contacting their support folks to double check.

How to get properties hasid/key with vertexes info in one gremlin query or Gremlin.Net

I try to get properties which has key or id in following query by Gremlin.Net, but vertex info(id and label) in VertexProperty is null in result.
g.V().Properties<VertexProperty>().HasKey(somekey).Promise(p => p.ToList())
So i try another way, but it's return class is Path, and i had to write an ugly code for type conversion.
g.V().Properties<VertexProperty>().HasKey(somekey).Path().By(__.ValueMap<object, object>(true))
Is there a better way to achieve this requirement
I think basically the only thing missing to get what you want is the Project() step.
In order to find all vertices that have a certain property key and then get their id, label, and then all information about that property, you can use this traversal:
g.V().
Has(someKey).
Project<object>("vertexId", "vertexLabel", "property").
By(T.Id).
By(T.Label).
By(__.Properties<object>(someKey).ElementMap<object>()).
Promise(t => t.ToList());
This returns a Dictionary where the keys are the arguments given to the Project step.
If you instead want to filter by a certain property id instead of a property key, then you can do it in a very similar way:
g.V().
Where(__.Properties<object>().HasId(propertyId)).
Project<object>("vertexId", "vertexLabel", "property").
By(T.Id).
By(T.Label).
By(__.Properties<object>(someKey).ElementMap<object>()).
Promise(t => t.ToList());
This filters in both cases first the vertices to only have vertices that have the properties we are looking for. That way, we can use the Project() step afterwards to get the desired data back.
ElementMap should give all information back about the properties that you want.
Note however that these traversals will most likely require a full graph scan in JanusGraph, meaning that it has to iterate over all vertices in your graph. The reason is that these traversals cannot use an index which would make them much more efficient. So, for larger graphs, the traversals will probably not be feasible.
If you had the vertex ids available instead of the property ids in the second traversal, then you could make the traversal a lot more efficient by replacing g.V().Where([...]) simply with g.V(id).

Create Vertex only if "from" and "to" vertex exists

I want to create 1000+ Edges in a single query.
Currently, I am using the AWS Neptune database and gremlin.net for creating it.
The issue I am facing is related to the speed. It took huge time because of HTTP requests.
So I am planning to combine all of my queries in a string and executing in a single shot.
_g.AddE("allow").From(_g.V().HasLabel('person').Has('name', 'name1')).To(_g.V().HasLabel('phone').Where(__.Out().Has('sensor', 'nfc'))).Next();
There are chances that the "To" (target) Vertex may not be available in the database. When it is the case this query fails as well. So I had to apply a check if that vertex exists before executing this query using hasNext().
So as of now its working fine, but when I am thinking of combining all 1000+ edge creation at once, is it possible to write a query which doesn't break if "To" (target) Vertex not found?
You should look at using the Element Existence pattern for each vertex as shown in the TinkerPop Recipes.
In your example you would replace this section of your query:
_g.V().HasLabel('person').Has('name', 'name1')
with something like this (I don't have a .NET environment to test the syntax):
__.V().Has('person', 'name', 'name1').Fold().
coalesce(__.Unfold(), __.AddV('person').Property('name', 'name1')
This will act as an Upsert and either return the existing vertex or add a new one with the name property. This same pattern can then be used on your To step to ensure that it exists before the edge is created as well.

When is it appropriate to use g.query()...vertex() vs g.V.has()

I'm a little confused about this one. Several similar examples can be found throughout the documentation. Such as :
g.V.has('name','hercules').next()
g.query().has("name",EQUAL,"hercules").vertices()
Could someone clarify what the difference in the process is between the two above?
Thanks
The first is gremlin-groovy syntax:
g.V.has('name','hercules').next()
and either iterates all vertices looking for vertices that have a "name" property with a value of "hercules". In the event that "name" is indexed then titan will utilize the index to avoid the linear scan to find such vertices.
The second is basically Java and the Titan API. The above gremlin-groovy code basically compiles down to your second statement:
g.query().has("name",EQUAL,"hercules").vertices()
however, in the case of the second statement it returns an iterator of all vertices that match the filter and doesn't just pop off the first one as shown in the gremlin statement (given the use of next()).

Resources