How to get the results from properties of different verticies in gremlin? - gremlin

I have this Database:
Clients => Incident => File => Filename
Clients have an ID
Incidents have an ID and a reportedOn property
Files have an ID and a fileSize, mimeType, malware property
Filenames have an ID
Client have a outgoing Edge to Incidents (reported), incident have a outgoing Edge to file (containsFile), file have a outgoing Edge to filename (hasName).
What Query I have to execute in gremlin to get the filename-ID, the file-ID, the file-fileSize and the incident-reportedOn values in one result?
Here is some sample DATA:
g.addV('client').property('id','1')
addV('incident').property('id','11').property('reportedON'2/15/2019 8:01:19 AM')
addV('file').property('id','100').property('fileSize', '432534')
addV('fileName').property('id','file.pdf')
addE('reported').from('1').to('11').
addE('containsFile').from('11').to('100').
addE('hasName').from('100').to('file.pdf').iterate()

The traversal you've posted to create the sample data contains dozens of errors. Don't be a pain in the ass, double-check what you post.
Anyway, here's a fixed version of your query:
g.addV('client').property('id','1').as('1').
addV('incident').property('id','11').property('reportedON', '2/15/2019 8:01:19 AM').as('11').
addV('file').property('id','100').property('fileSize', '432534').as('100').
addV('fileName').property('id','file.pdf').as('file.pdf').
addE('reported').from('1').to('11').
addE('containsFile').from('11').to('100').
addE('hasName').from('100').to('file.pdf').iterate()
get the filename-ID, the file-ID, the file-fileSize and the incident-reportedOn values
gremlin> g.V().has('client','id','1').
......1> out('reported').as('incident').
......2> out('containsFile').
......3> out('hasName').
......4> path().
......5> from('incident').
......6> by(union(group().
......7> by(label).
......8> by('id'),
......9> valueMap()).
.....10> unfold().
.....11> filter(select(keys).is(neq('id'))).
.....12> group().
.....13> by(keys).
.....14> by(select(values).unfold())).
.....15> unfold().unfold().
.....16> group().
.....17> by(keys).
.....18> by(select(values).unfold())
==>[fileName:file.pdf,file:100,reportedON:2/15/2019 8:01:19 AM,fileSize:432534,incident:11]
Only getting the path().from('incident').by(valueMap()) alone would already give you everything you need. However, I added a bit of re-grouping to get a nicer formatted result.

Related

Can we JSON to do search in Azure Cognitive search like Mongo DB

I am working on Cosmos DB SQL API and I want to run JSON formated query strings. In Mongo DB you can run queries like
"$and": [
{
"images": {
"$exists": true
},
"$where": "this.something.length > 1"
},
{
"location": "core"
}
]
Is there a way to run similar queries in Azure Cognitive search and Cosmos DB?
What is supported for JSON queries in NoSQL for Cosmos DB is JSON expressions and there is a small sample for Cosmos DB NoSQL indexer: projections with JSON expressions query:
SELECT VALUE { "id":c.id, "Name":c.contact.firstName, "Company":c.company, "_ts":c._ts } FROM c WHERE c._ts >= #HighWaterMark ORDER BY c._ts

Gremlin - extracting T.id and T.label from valueMap

This question is related to this post that Kelvin Lawrence answered very helpfully, wanted to post it as a separate question bec the first question was answered well already.
From this query:
g.V('a2661f57-8aa7-4e5c-9c89-55cf9b7e4cf8').as('self')
.sideEffect(out('rated').store('movies'))
.out('friended')
.group()
.by(valueMap(true).by(unfold()))
.by(out('rated').where(within('movies')).count())
.order(local)
.by(values,desc)
.unfold()
.select(keys)
i get this result:
1 {<T.id: 1>: 'fdc45bd3-be08-4716-b20f-b4f04987c5e0', <T.label: 4>: 'user', 'username': 'elin102dev', 'name': 'elin obrien', 'avatarUrl': 'public/avatars/fdc45bd3-be08-4716-b20f-b4f04987c5e0.jpg'}
2 {<T.id: 1>: 'bbf1b0db-68cc-41f1-8c7a-5fd77b698e39', <T.label: 4>: 'user', 'username': 'iris', 'name': 'Iris Ebert', 'avatarUrl': 'public/avatars/bbf1b0db-68cc-41f1-8c7a-5fd77b698e39.jpg'}
3 {<T.id: 1>: '34c2ea80-4f84-4652-a7c3-48ce43d9aea7', <T.label: 4>: 'user', 'username': 'iris103dev', 'name': 'iris obrien', 'avatarUrl': 'public/avatars/34c2ea80-4f84-4652-a7c3-48ce43d9aea7.jpg'}
I want to convert the T.id and T.label values in the response to simply "id" and "label". Kelvin, if you're reading this, i i tried appending the following to the query above but it returns 0 results:
.select('id', 'label', 'username', 'name', 'avatarUrl')
.by(T.id)
.by(T.label)
.by('username')
.by('name')
.by('avatarUrl')
.toList()
I could use a little more help figuring this out, not having much success. Thanks in advance.
It is not possible to do a select(T.id) inside a query. In code if you get a map back you can access the T.id field. For a case such as this, it is better to delay fetching the properties until you finally need them. You might try rewriting the query like this.
g.V('a2661f57-8aa7-4e5c-9c89-55cf9b7e4cf8').as('self').
sideEffect(out('rated').store('movies')).
out('friended').
group().
by().
by(out('rated').where(within('movies')).count()).
order(local).
by(values,desc).
unfold().
select(keys).
project('id','label','username').
by(id).
by(label).
by('username')

Merging list of maps in gremlin

I have this relationship:
person --likes--> subject
This is my query:
g.V().
hasLabel('person').
has('name', 'Joe').
outE('likes').
range(0, 2).
union(identity(), inV().hasLabel('subject')).
valueMap('rating', 'name').
At this point, I get result that looks like this:
[
{
"rating": 3.236155563
},
{
"rating": 3.162886797
},
{
"name": "math"
},
{
"name": "history"
}
]
I'd like to get something like this:
[
{
"rating": 3.236155563,
"name": "math"
},
{
"rating": 3.162886797,
"name": "history"
},
]
I've tried grouping the results - which gives me the structure I want - but because of the identical keys, I only get 1 set of results back.
It always helps when you post the code to create the graph so we can give you a tested answer. Like so
g.addV('person').property('name', 'P1').as('p1').
addV('subject').property('name', 'Math').as('math').
addV('subject').property('name', 'History').as('history').
addV('subject').property('name', 'Geography').as('geography').
addE('likes').from('p1').to('math').property('rating', 1.2).
addE('likes').from('p1').to('history').property('rating', 2.3).
addE('likes').from('p1').to('geography').property('rating', 3.4)
I believe you are trying to write a traversal that starts from a certain person, go out along the first two "likes" edges and get the names of the subjects that he likes and the rating on the corresponding "likes" edge.
g.V().has('person', 'name', 'P1').
outE('likes').
range(0, 2).
project('SubjectName', 'Rating').
by(inV().values('name')).
by(values('rating'))

Formatting CosmosDB Gremlin Query

I'm new to Gremlin and CosmosDB. I've been following the tinkerpop tutorials and am using the TinkerFactory.createModern() test graph.
What I am looking for is to return a graphson object similar to this from cosmosdb.
{
"user": {
"name": "Marko",
"age": 29
},
"knows": [
{"name": "josh", "age": 32},
{"name": "vadas", "age": 27}
],
"created": [
{"name": "lop", "lang": "java"}
]
}
My thoughts were to try
g.V().has('name', 'marko').as('user').out('knows').as('knows').out('created').as('created').select('user', 'knows', 'created')
What i really get back is in the picture below.
I was hoping to have single user object, with an array of knows objects and software objects.
If this is possible can you please explain what steps need to be used to get this format.
Hope my question is clear and thanks to anyone that can help =)
You should use project():
gremlin> g.V().has('person','name','marko').
......1> project('user','knows','created').
......2> by(project('name','age').by('name').by('age')).
......3> by(out('knows').project('name','age').by('name').by('age')).
......4> by(out('created').project('name','lang').by('name').by('lang'))
==>[user:[name:marko,age:29],knows:[name:vadas,age:27],created:[name:lop,lang:java]]
That syntax should work with CosmosDB. In TinkerPop 3.4.0, things get a little nicer as you can use valueMap() a bit more effectively (but I don't think that CosmosDB supports that as of the time of this answer):
gremlin> g.V().has('person','name','marko').
......1> project('user','knows','created').
......2> by(valueMap('name','age').by(unfold())).
......3> by(out('knows').valueMap('name','age').by(unfold())).
......4> by(out('created').valueMap('name','lang').by(unfold()))
==>[user:[name:marko,age:29],knows:[name:vadas,age:27],created:[name:lop,lang:java]]

Gremlin on Azure CosmosDB: how to project the related vertices' properties?

I use Microsoft.Azure.Graphs library to connect to a Cosmos DB instance and query the graph database.
I'm trying to optimize my Gremlin queries in order to only select those properties that I only require. However, I don't know how to choose which properties to select from edges and vertices.
Let's say we start from this query:
gremlin> g.V().hasLabel('user').
project('user', 'edges', 'relatedVertices')
.by()
.by(bothE().fold())
.by(both().fold())
This will return something along the lines of:
{
"user": {
"id": "<userId>",
"type": "vertex",
"label": "user",
"properties": [
// all vertex properties
]
},
"edges": [{
"id": "<edgeId>",
"type": "edge",
"label": "<edgeName>",
"inV": <relatedVertexId>,
"inVLabel": "<relatedVertexLabel>",
"outV": "<relatedVertexId>",
"outVLabel": "<relatedVertexLabel>"
"properties": [
// edge properties, if any
]
}],
"relatedVertices": [{
"id": "<vertexId>",
"type": "vertex",
"label": "<relatedVertexLabel>",
"properties": [
// all related vertex properties
]
}]
}
Now let's say we only take a couple of properties from the root vertex which we named "User":
gremlin> g.V().hasLabel('user').
project('id', 'prop1', 'prop2', 'edges', 'relatedVertices')
.by(id)
.by('prop1')
.by('prop2')
.by(bothE().fold())
.by(both().fold())
Which will make some progress for us and yield something along the lines of:
{
"id": "<userId>",
"prop1": "value1",
"prop2": "value2",
"edges": [{
"id": "<edgeId>",
"type": "edge",
"label": "<edgeName>",
"inV": <relatedVertexId>,
"inVLabel": "<relatedVertexLabel>",
"outV": "<relatedVertexId>",
"outVLabel": "<relatedVertexLabel>"
"properties": [
// edge properties, if any
]
}],
"relatedVertices": [{
"id": "<vertexId>",
"type": "vertex",
"label": "<relatedVertexLabel>",
"properties": [
// all related vertex properties
]
}]
}
Now is it possible to do something similar to edges and related vertices? Say, something along the lines of:
gremlin> g.V().hasLabel('user').
project('id', 'prop1', 'prop2', 'edges', 'relatedVertices')
.by(id)
.by('prop1')
.by('prop2')
.by(bothE().fold()
.project('edgeId', 'edgeLabel', 'edgeInV', 'edgeOutV')
.by(id)
.by(label)
.by(inV)
.by(outV))
.by(both().fold()
.project('vertexId', 'someProp1', 'someProp2')
.by(id)
.by('someProp1')
.by('someProp2'))
My aim is to get an output like this:
{
"id": "<userId>",
"prop1": "value1",
"prop2": "value2",
"edges": [{
"edgeId": "<edgeId>",
"edgeLabel": "<edgeName>",
"edgeInV": <relatedVertexId>,
"edgeOutV": "<relatedVertexId>"
}],
"relatedVertices": [{
"vertexId": "<vertexId>",
"someProp1": "someValue1",
"someProp2": "someValue2"
}]
}
You were pretty close:
gremlin> g.V().hasLabel('person').
......1> project('name','age','edges','relatedVertices').
......2> by('name').
......3> by('age').
......4> by(bothE().
......5> project('id','inV','outV').
......6> by(id).
......7> by(inV().id()).
......8> by(outV().id()).
......9> fold()).
.....10> by(both().
.....11> project('id','name').
.....12> by(id).
.....13> by('name').
.....14> fold())
==>[name:marko,age:29,edges:[[id:9,inV:3,outV:1],[id:7,inV:2,outV:1],[id:8,inV:4,outV:1]],relatedVertices:[[id:3,name:lop],[id:2,name:vadas],[id:4,name:josh]]]
==>[name:vadas,age:27,edges:[[id:7,inV:2,outV:1]],relatedVertices:[[id:1,name:marko]]]
==>[name:josh,age:32,edges:[[id:10,inV:5,outV:4],[id:11,inV:3,outV:4],[id:8,inV:4,outV:1]],relatedVertices:[[id:5,name:ripple],[id:3,name:lop],[id:1,name:marko]]]
==>[name:peter,age:35,edges:[[id:12,inV:3,outV:6]],relatedVertices:[[id:3,name:lop]]]
Two points you should consider when writing Gremlin:
The output of the previous step feeds into the input of the following step and if you don't clearly see what's coming out of a particular step, then the steps that follow may not end up being right. In your example, in the first by() you added the project() after the fold() which was basically saying "Hey, Gremlin, project that List of edges for me." But in the by() modulators for project() you treated the input to project not as a List but as individual edges which likely led to an error. In Java, that error is: "java.util.ArrayList cannot be cast to org.apache.tinkerpop.gremlin.structure.Element". An error like that is a clue that somewhere in your Gremlin you are not properly following the outputs and inputs of your steps.
fold() takes all the elements in the stream of the traversal and converts them to a List. So where you had many objects, you will now have one after the fold(). To process them as a stream again, you would need to unfold() them for steps to operate on them individually. In this case, we just needed to move the fold() to the end of the statement after doing the sub-project() for each edge/vertex. But why do we need fold() at all? The answer is that the traversal passed to the by() modulator is not iterated completely by the step that it modifies (in this case project()). The step only calls next() to get the first element in the stream - this is by design. Therefore, in cases where you want the entire stream of a by() to be processed you must reduce the stream to a single object. You might do that with fold(), but other examples include sum(), count(), mean(), etc.

Resources