Gremlin on Azure CosmosDB: how to project the related vertices' properties? - azure-cosmosdb

I use Microsoft.Azure.Graphs library to connect to a Cosmos DB instance and query the graph database.
I'm trying to optimize my Gremlin queries in order to only select those properties that I only require. However, I don't know how to choose which properties to select from edges and vertices.
Let's say we start from this query:
gremlin> g.V().hasLabel('user').
project('user', 'edges', 'relatedVertices')
.by()
.by(bothE().fold())
.by(both().fold())
This will return something along the lines of:
{
"user": {
"id": "<userId>",
"type": "vertex",
"label": "user",
"properties": [
// all vertex properties
]
},
"edges": [{
"id": "<edgeId>",
"type": "edge",
"label": "<edgeName>",
"inV": <relatedVertexId>,
"inVLabel": "<relatedVertexLabel>",
"outV": "<relatedVertexId>",
"outVLabel": "<relatedVertexLabel>"
"properties": [
// edge properties, if any
]
}],
"relatedVertices": [{
"id": "<vertexId>",
"type": "vertex",
"label": "<relatedVertexLabel>",
"properties": [
// all related vertex properties
]
}]
}
Now let's say we only take a couple of properties from the root vertex which we named "User":
gremlin> g.V().hasLabel('user').
project('id', 'prop1', 'prop2', 'edges', 'relatedVertices')
.by(id)
.by('prop1')
.by('prop2')
.by(bothE().fold())
.by(both().fold())
Which will make some progress for us and yield something along the lines of:
{
"id": "<userId>",
"prop1": "value1",
"prop2": "value2",
"edges": [{
"id": "<edgeId>",
"type": "edge",
"label": "<edgeName>",
"inV": <relatedVertexId>,
"inVLabel": "<relatedVertexLabel>",
"outV": "<relatedVertexId>",
"outVLabel": "<relatedVertexLabel>"
"properties": [
// edge properties, if any
]
}],
"relatedVertices": [{
"id": "<vertexId>",
"type": "vertex",
"label": "<relatedVertexLabel>",
"properties": [
// all related vertex properties
]
}]
}
Now is it possible to do something similar to edges and related vertices? Say, something along the lines of:
gremlin> g.V().hasLabel('user').
project('id', 'prop1', 'prop2', 'edges', 'relatedVertices')
.by(id)
.by('prop1')
.by('prop2')
.by(bothE().fold()
.project('edgeId', 'edgeLabel', 'edgeInV', 'edgeOutV')
.by(id)
.by(label)
.by(inV)
.by(outV))
.by(both().fold()
.project('vertexId', 'someProp1', 'someProp2')
.by(id)
.by('someProp1')
.by('someProp2'))
My aim is to get an output like this:
{
"id": "<userId>",
"prop1": "value1",
"prop2": "value2",
"edges": [{
"edgeId": "<edgeId>",
"edgeLabel": "<edgeName>",
"edgeInV": <relatedVertexId>,
"edgeOutV": "<relatedVertexId>"
}],
"relatedVertices": [{
"vertexId": "<vertexId>",
"someProp1": "someValue1",
"someProp2": "someValue2"
}]
}

You were pretty close:
gremlin> g.V().hasLabel('person').
......1> project('name','age','edges','relatedVertices').
......2> by('name').
......3> by('age').
......4> by(bothE().
......5> project('id','inV','outV').
......6> by(id).
......7> by(inV().id()).
......8> by(outV().id()).
......9> fold()).
.....10> by(both().
.....11> project('id','name').
.....12> by(id).
.....13> by('name').
.....14> fold())
==>[name:marko,age:29,edges:[[id:9,inV:3,outV:1],[id:7,inV:2,outV:1],[id:8,inV:4,outV:1]],relatedVertices:[[id:3,name:lop],[id:2,name:vadas],[id:4,name:josh]]]
==>[name:vadas,age:27,edges:[[id:7,inV:2,outV:1]],relatedVertices:[[id:1,name:marko]]]
==>[name:josh,age:32,edges:[[id:10,inV:5,outV:4],[id:11,inV:3,outV:4],[id:8,inV:4,outV:1]],relatedVertices:[[id:5,name:ripple],[id:3,name:lop],[id:1,name:marko]]]
==>[name:peter,age:35,edges:[[id:12,inV:3,outV:6]],relatedVertices:[[id:3,name:lop]]]
Two points you should consider when writing Gremlin:
The output of the previous step feeds into the input of the following step and if you don't clearly see what's coming out of a particular step, then the steps that follow may not end up being right. In your example, in the first by() you added the project() after the fold() which was basically saying "Hey, Gremlin, project that List of edges for me." But in the by() modulators for project() you treated the input to project not as a List but as individual edges which likely led to an error. In Java, that error is: "java.util.ArrayList cannot be cast to org.apache.tinkerpop.gremlin.structure.Element". An error like that is a clue that somewhere in your Gremlin you are not properly following the outputs and inputs of your steps.
fold() takes all the elements in the stream of the traversal and converts them to a List. So where you had many objects, you will now have one after the fold(). To process them as a stream again, you would need to unfold() them for steps to operate on them individually. In this case, we just needed to move the fold() to the end of the statement after doing the sub-project() for each edge/vertex. But why do we need fold() at all? The answer is that the traversal passed to the by() modulator is not iterated completely by the step that it modifies (in this case project()). The step only calls next() to get the first element in the stream - this is by design. Therefore, in cases where you want the entire stream of a by() to be processed you must reduce the stream to a single object. You might do that with fold(), but other examples include sum(), count(), mean(), etc.

Related

Merging list of maps in gremlin

I have this relationship:
person --likes--> subject
This is my query:
g.V().
hasLabel('person').
has('name', 'Joe').
outE('likes').
range(0, 2).
union(identity(), inV().hasLabel('subject')).
valueMap('rating', 'name').
At this point, I get result that looks like this:
[
{
"rating": 3.236155563
},
{
"rating": 3.162886797
},
{
"name": "math"
},
{
"name": "history"
}
]
I'd like to get something like this:
[
{
"rating": 3.236155563,
"name": "math"
},
{
"rating": 3.162886797,
"name": "history"
},
]
I've tried grouping the results - which gives me the structure I want - but because of the identical keys, I only get 1 set of results back.
It always helps when you post the code to create the graph so we can give you a tested answer. Like so
g.addV('person').property('name', 'P1').as('p1').
addV('subject').property('name', 'Math').as('math').
addV('subject').property('name', 'History').as('history').
addV('subject').property('name', 'Geography').as('geography').
addE('likes').from('p1').to('math').property('rating', 1.2).
addE('likes').from('p1').to('history').property('rating', 2.3).
addE('likes').from('p1').to('geography').property('rating', 3.4)
I believe you are trying to write a traversal that starts from a certain person, go out along the first two "likes" edges and get the names of the subjects that he likes and the rating on the corresponding "likes" edge.
g.V().has('person', 'name', 'P1').
outE('likes').
range(0, 2).
project('SubjectName', 'Rating').
by(inV().values('name')).
by(values('rating'))

Create a "join" query with data from edge and connected vertix

I have a Gremlin API Cosmos DB. In the DB I have one type of Vertice with Label User that are connected to Vertices labeled Companies. I then want to show all connected companies. I do the query g.V('id-of-User').outE() and gets all connected Companies. The result might look something like this:
[
{
"id": "08f97a1d-9e81-4ccc-a498-90eb502b1879",
"label": "AuthorizedSignatory",
"type": "edge",
"inVLabel": "Company",
"outVLabel": "User",
"inV": "abd51134-1524-44fe-8a49-60d2d449a1f3",
"outV": "103bf1b9-464f-4f68-a4ca-7dfdbe94ae84"
},
{
"id": "c36b640b-9574-403b-8ab6-fcce695caa90",
"label": "AuthorizedSignatory",
"type": "edge",
"inVLabel": "Company",
"outVLabel": "User",
"inV": "2c14d279-00a4-41ad-a8c0-f3b882864568",
"outV": "103bf1b9-464f-4f68-a4ca-7dfdbe94ae84"
}
]
This is absolutely as expected. Now I want to take this a bit further and instead of just showing the GUID in the inV parameter I also want to include the Company Name in the result object, but I do not understand how to do the equivalent to a SQL join here.
Can someone please help me!!
What I want is something similar to the example below:
[
{
"id": "08f97a1d-9e81-4ccc-a498-90eb502b1879",
"label": "AuthorizedSignatory",
"type": "edge",
"inVLabel": "Company",
"outVLabel": "User",
"inV": "abd51134-1524-44fe-8a49-60d2d449a1f3",
"outV": "103bf1b9-464f-4f68-a4ca-7dfdbe94ae84",
"CompanyName": "ACME CORP"
},
{
"id": "c36b640b-9574-403b-8ab6-fcce695caa90",
"label": "AuthorizedSignatory",
"type": "edge",
"inVLabel": "Company",
"outVLabel": "User",
"inV": "2c14d279-00a4-41ad-a8c0-f3b882864568",
"outV": "103bf1b9-464f-4f68-a4ca-7dfdbe94ae84",
"CompanyName": "Giganticorp"
}
]
Where the CompanyName is one of the properties in the Company Vertice with the guid in inV prop.
There is no "join". The data is already connected by way of the edge, so you simply need to traverse further along your graph to get the "CompanyName".
g.V('id-of-User').out().values("CompanyName")
That shows you all of the names of the companies related to that user. If you're saying that you still want to show the data from the edge in addition to company name as you had in your examples, then no problem, project() the edge being specific about what you want:
g.V('id-of-User').outE().
project('eid','label','companyName').
by(T.id).
by(T.label).
by(inV().values("CompanyName"))
Again, note that there is no "join" for the "CompanyName". As the data is implicitly joined by way of the edge you just need to traverse over inV() to reach the data there.

Formatting CosmosDB Gremlin Query

I'm new to Gremlin and CosmosDB. I've been following the tinkerpop tutorials and am using the TinkerFactory.createModern() test graph.
What I am looking for is to return a graphson object similar to this from cosmosdb.
{
"user": {
"name": "Marko",
"age": 29
},
"knows": [
{"name": "josh", "age": 32},
{"name": "vadas", "age": 27}
],
"created": [
{"name": "lop", "lang": "java"}
]
}
My thoughts were to try
g.V().has('name', 'marko').as('user').out('knows').as('knows').out('created').as('created').select('user', 'knows', 'created')
What i really get back is in the picture below.
I was hoping to have single user object, with an array of knows objects and software objects.
If this is possible can you please explain what steps need to be used to get this format.
Hope my question is clear and thanks to anyone that can help =)
You should use project():
gremlin> g.V().has('person','name','marko').
......1> project('user','knows','created').
......2> by(project('name','age').by('name').by('age')).
......3> by(out('knows').project('name','age').by('name').by('age')).
......4> by(out('created').project('name','lang').by('name').by('lang'))
==>[user:[name:marko,age:29],knows:[name:vadas,age:27],created:[name:lop,lang:java]]
That syntax should work with CosmosDB. In TinkerPop 3.4.0, things get a little nicer as you can use valueMap() a bit more effectively (but I don't think that CosmosDB supports that as of the time of this answer):
gremlin> g.V().has('person','name','marko').
......1> project('user','knows','created').
......2> by(valueMap('name','age').by(unfold())).
......3> by(out('knows').valueMap('name','age').by(unfold())).
......4> by(out('created').valueMap('name','lang').by(unfold()))
==>[user:[name:marko,age:29],knows:[name:vadas,age:27],created:[name:lop,lang:java]]

Gremlin querying Edge inVLabel, outVLabel

I have the following example Edge labeled "posts". "posts" can can have multiple types of parent Vertice (outVLabel) such as "channel", "publisher", "user", etc. How do you query for all Edges that have an outVLabel of "channel" without interrogating the label on the out() vertice? I want an array of "posts" Edges returned.
Query:
g.E().hasLabel('posts').has(???, 'channel')
Edge object:
[{
"id": "83c972b0-315d-49fe-a735-882c4dcbdaa2",
"label": "posts",
"type": "edge",
"inVLabel": "article",
"outVLabel": "channel",
"inV": "7410b6c8-ed70-4a00-800c-489d596907da",
"outV": "c8c5f45d-0195-49c5-b7ae-9eda1d441bc9",
"properties": {
"service": "rss"
}]
You would have to do:
g.E().hasLabel('posts').where(outV().hasLabel('channel'))
or if necessary, denormalize and place the outgoing vertex label on the edge as a property, in which case you could then do:
g.E().has('posts', 'outVLabel', 'channel')

Get vertices with a simpler format

Is there a way to get a list of vertices with a simpler format?
Currently, the following query:
g.V().has(label, 'Quantity').has('text', '627 km');
returns an object like this:
{
"id": 42545168,
"label": "Quantity",
"type": "vertex",
"properties": {
"sentence": [
{
"id": "pkbgi-pbw28-745",
"value": "null"
}
],
"updated_text": [
{
"id": "pk9vm-pbw28-5j9",
"value": "627 km"
}
],[...]
And when I get a list of edges it is formatted in a simpler format:
g.E().has(label, 'locatedAt').has('out_entity_id','41573-41579');
returns:
{
"id": "ozfnt-ip8o-2mtx-g8vs",
"label": "locatedAt",
"type": "edge",
"inVLabel": "Location",
"outVLabel": "Location",
"inV": 758008,
"outV": 872520,
"properties": {
"sentence": "Bolloré is a corporation (société anonyme) with a Board of Directors whose registered offi ce is located at Odet, 29500 Ergué-Gabéric in France.",
"in_entity_id": "41544-41548",
"score": "0.795793",
"out_entity_id": "41573-41579"
}
}
How so?
Is there a way to get vertices formatted this way?
My advice is to rather than have your query return the whole vertex, return the specific properties that you are interested in. For example the vertex ID or some selected properties that you are interested in or a valueMap. Otherwise what you will get back is essentially everything. This is really the same as in SQL trying to not do a "select *" but selecting only what you really care about.
Edited to add an example that returns the IDs of matching vertices.
g.V().has(label, 'Quantity').has('text', '627 km').id().fold()
Will yield a result that looks like this
{"requestId":"73f40519-87c8-4037-a9fc-41be82b3b227","status":{"message":"","code":200,"attributes":{}},"result":{"data":[[20608,28920,32912,106744,123080,135200,139296,143464,143488,143560,151584,155688,155752,159784,188520,254016,282688,286968,311360,323832,348408,4344,835648,8336,1343616,12352]],"meta":{}}}

Resources