Gremlin - search for vertex by properties of property - gremlin

I have a vertex that has nested properties under a specific property. Example:
{
"id": "X",
"label": "deployment",
"type": "vertex",
"properties": {
"name": [
{
"id": "X",
"value": "myvalue1"
}
],
"labels": [
{
"id": "xxxxx",
"value": "my-labels",
"properties": {
"key": "value"
}
}
]
}
}
My problem is: I would like to search for a sub-property with a specific value. How would I construct the query to find vertices with that value? I can't seem to find any documentation on trying to find that sub-property.
Plenty of documentation on finding and sorting on a property of a vertex, but not this.
The goal of doing this, is that there are many "labels" under my labels and I want to eventually create edges among vertices with matching sub labels.

This will be a scan over all vertices, so be warned that it's not going to be a high-performance query.
g.V().filter(properties("my-labels").has("key", "value"))
To give you an example over The Crew graph:
//
// Where did TinkerPop crew members move in and after 2005?
//
gremlin> g = TinkerFactory.createTheCrew().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:14], standard]
gremlin> g.V().filter(properties("location").has("startTime", gte(2005))).
project("name","locations").
by("name").
by(properties("location").has("startTime", gte(2005)).value().fold())
==>[name:marko,locations:[santa fe]]
==>[name:stephen,locations:[purcellville]]
==>[name:matthias,locations:[baltimore,oakland,seattle]]
==>[name:daniel,locations:[kaiserslautern,aachen]]

Related

Cosmos DB query syntax WHERE clause with array in array

The following json represents two documents in a Cosmos DB container.
How can I write a query that gets any document that has an item with an id of item_1 and value of bar.
I've looked into ARRAY_CONTAINS, but don't get this to work with array's in array's.
Als I've tried somethings with any. Although I can't seem to find any documentation on how to use this, any seems to be a valid function, as I do get formatting highlights in the cosmos db explorer in Azure Portal.
For the any function I tried things like SELECT * FROM c WHERE c.pages.any(p, p.items.any(i, i.id = "item_1" AND i.value = "bar")).
The id fields are unique so if it's easier to find any document that contains any object with the right id and value, that would be fine too.
[
{
"type": "form",
"id": "form_a",
"pages": [
{
"name": "Page 1",
"id": "page_1",
"items": [
{
"id": "item_1",
"value": "foo"
}
]
}
]
},
{
"type": "form",
"id": "form_b",
"pages": [
{
"name": "Page 1",
"id": "page_1",
"items": [
{
"id": "item_1",
"value": "bar"
}
]
}
]
}
]
I think join could handle with WHERE clause with array in array.Please test below sql:
SELECT c.id FROM c
join pages in c.pages
where array_contains(pages.items,{"id": "item_1","value": "bar"},true)
Output:

Azure Cosmos Graph How to select vertex properties to return?

if I have a vertex like this:
{
"id": "1",
"label": "user",
"type": "vertex",
"outE": {
"worksAt": [
{
"id": "6e47aa14-0a3a-4e45-8ac4-043ec9f32b50",
"inV": "spaceneedle.com.br"
}
]
},
"properties": {
"name": [
{
"id": "cce42090-efc5-4bb2-9576-922d19164d98",
"value": "Murilo"
}
],
"domain": [
{
"id": "murilo|domain",
"value": "spaceneedle.com.br"
}
]
}
}
Is it possible to choose properties to return to have an object like the following using gremlin?
{
"id": "1",
"name": "Murilo"
}
Thanks!
I'll use the TinkerPop "modern" toy graph to demonstrate some options:
gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
You could do something like this:
gremlin> g.V(1).valueMap(true,'name')
==>[label:person,name:[marko],id:1]
but that includes vertex label and wraps the "name" in a list (to account for multi-properties). So, while quick/easy, it's not quite a match for the output you asked for. To get that specific output, I would use project() step which would look like this:
gremlin> g.V(1).project("id","name").by(id).by('name')
==>[id:1,name:marko]
If you have a mix of vertices that are being projected where some may not have certain properties you can use coalesce() or similar means to ensure a default value:
gremlin> g.V().project('id','name','age').by(id).by('name').by(coalesce(values('age'),constant('none')))
==>[id:1,name:marko,age:45]
==>[id:2,name:vadas,age:27]
==>[id:3,name:lop,age:none]
==>[id:4,name:josh,age:32]
==>[id:5,name:ripple,age:none]
==>[id:6,name:peter,age:35]

Model Firebase realtime database in JSON schema

The Firebase database uses a subset of JSON. Thus it seems obvious to use JSON schema to describe the data model. This would allow to make use of tools which generate HTML forms or typescript models from it or generate random test data.
My question: How would one model key-value pairs in JSON schema, where the key is an id?
Example: (borrowed from firebase spec)
{
"users": {
"mchen": {
"name": "Mary Chen",
// index Mary's groups in her profile
"groups": {
// the value here doesn't matter, just that the key exists
"alpha": true,
"charlie": true
}
},
...
The group name here is used as an group id. In this reference (groups object) as well as in the group object itself, the id is used as the property name.
JSON schema for above example is:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"users": {
"type": "object",
"properties": {
"mchen": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"groups": {
"type": "object",
"properties": {
"alpha": {
"type": "boolean"
},
"charlie": {
"type": "boolean"
}
}
}
}
}
}
}
}
}
What I would need for the example is something like the following, where NAME is a placeholder for the property name and NAME_TYPE defines it's type.
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"users": {
"type": "object",
"properties": {
NAME: {
"type": "object",
NAME_TYPE: "string",
"properties": {
"name": {
"type": "string"
},
"groups": {
"type": "object",
"properties": {
NAME: {
NAME_TYPE: "string"
"type": "boolean"
}
}
}
}
}
}
}
}
}
(Maybe I am on the completely wrong path here or maybe JSON schema isn't able to model the required structure.)
There are certainly arrays in Firebase but they are situational and should be used only in certain use cases and should generally be avoided.
The Firebase structure you posted is very common and there are key:value pairs in your structure so the question is a tad unclear but I'll give it a shot.
'groups' is the parent key and the values are the children key:value pairs of group1:value, group2:value.
The group1, group2 keys you listed are essentially the same as the id's listed in the first example, other than it's not an array. i.e. array's have sequential, hard coded-indexes (0th, 1st, 2nd etc) whereas the keys in firebase are open-ended and can generally be set to any alphanumeric value - they are used more to refer to a specific node than the enforce an particular order (i'm speaking generally here)
In the Firebase structure, those keys could be id0, id1, id2... or a,b,c... or a timestamp... or an auto-generated Firebase id (childByAutoId) that would also make them 'sequential'.
However, you can get into trouble assigning your own with id0, id1 etc.
id0
id1
id2
.
id9
id10
id11
The reality here is that the actual order will be
id0
id1
id10
id11
id2
The 'key' is that if you are using the keys to read data in sequentially, set them up as such. You may also want to consider generating your keys with childByAutoId (see docs for language specifics) and orderBy one of the child values such as a timestamp or index.
'groups': {
'auto-generated id': {
'name': 'alpha',
'index': 0,
'timestamp': '20160814074146'
...
},
'auto-generated id': {
'name': 'charlie',
'index': 1,
'timestamp': '20160814073600'
...
},
...
}
in the above case, I can orderBy name, index or timestamp.
Name and index will read the nodes in the order they are listed, if we order by timestamp, then the charlie node will be loaded first. Leveraging the child values to orderBy is very flexible.
Also, you can limit the set of data you are loading in with startingAt and endingAt. So for example, you want to load nodes starting at node 10 and ending at node 14. Easily done with non-array JSON data but not easily done if it's stored in an array as the entire array must be read in.

How to form inner SubQuery in Gremlin Server (Titan 1.0)?

I'm using Following Query :
g.V(741440).outE('Notification').order().by('PostedDateLong', decr).range(0,1).as('notificationInfo').match(
__.as('notificationInfo').inV().as('postInfo'),
).select('notificationInfo','postInfo')
it is giving following result :
{
"requestId": "9846447c-4217-4103-ac2e-de3536a3c62a",
"status": {
"message": "",
"code": ​200,
"attributes": { }
},
"result": {
"data": [
{
"notificationInfo": {
"id": "c0zs-fw3k-347p-g2g0",
"label": "Notification",
"type": "edge",
"inVLabel": "Comment",
"outVLabel": "User",
"inV": ​749664,
"outV": ​741440,
"properties": {
"ParentPostId": "823488",
"PostedDate": "2016-05-26T02:35:52.3889982Z",
"PostedDateLong": ​635998269523889982,
"Type": "CommentedOnPostNotification",
"NotificationInitiatedByVertexId": "1540312"
}
},
"postInfo": {
"id": ​749664,
"label": "Comment",
"type": "vertex",
"properties": {
"PostImage": [
{
"id": "amto-g2g0-2wat",
"value": ""
}
],
"PostedByUser": [
{
"id": "am18-g2g0-2txh",
"value": "orbitpage#gmail.com"
}
],
"PostedTime": [
{
"id": "amfg-g2g0-2upx",
"value": "2016-05-26T02:35:39.1489483Z"
}
],
"PostMessage": [
{
"id": "aln0-g2g0-2t51",
"value": "hi"
}
]
}
}
}
],
"meta": { }
}
}
I want to get information of Vertex "NotificationInitiatedByVertexId" (Edge Property ) in the response as well.
For that i tried following query :
g.V(741440).outE('Notification').order().by('PostedDateLong', decr).range(0,2).as('notificationInfo').match(
__.as('notificationInfo').inV().as('postInfo'),
g.V(1540312).next().as('notificationByUser')
).select('notificationInfo','postInfo','notificationByUser')
Note : I tried directly with vertex Id in subquery as I wasn't aware how to dynamically get value from edge property in query itself.
It is giving error. I tried a lot but am not able to find any solution.
I'm assuming that you are storing a Titan generated identifier in that edge property called NotificationInitiatedByVertexId. If so, please consider the following even though this first part doesn't really answer your question. I don't think you should store a vertex identifier on the edge. Your graph model should explicitly track the relationship of NotificationInitiatedBy with an edge and by storing the identifier of the vertex on the edge itself you are bypassing that. Also, if you ever have to migrate your data in some way, the ids won't be preserved (Titan will generate new ones) and trying to sort that out will be a mess.
Even if that is not a Titan generated identifier and a logical one you created, I still think I would look to adjust your graph schema and promote that Notification to a vertex. Then your Gremlin traversals would flow more easily.
Now, assuming you don't change that, then I don't see a reason to not just issue two queries in the same request and then combine the results to one data structure. You just need to do a lookup with the vertex id which is going to be pretty fast and inexpensive:
edgeStuff = g.V(741440).outE('Notification').
order().by('PostedDateLong', decr).range(0,1).as('notificationInfo').
... // whatever logic you have
select('notificationInfo','postInfo').next()
vertexStuff = g.V(edgeStuff.get('notificationInfo').value('NotificationInitiatedByVertexId')).next()
[notificationInitiatedBy: vertexStuff, notification: edgeStuff]

Document Db query filter for an attribute that contains an array

With the sample json shown below, am trying to retrieve all documents that contains atleast one category which is array object wrapped underneath Categories that has the text value 'drinks' with the following query but the returned result is empty. Can someone help me get this right?
SELECT items.id
,items.description
,items.Categories
FROM items
WHERE ARRAY_CONTAINS(items.Categories.Category.Text, "drink")
{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Categories": {
"Category": [{
"Type": "GS1",
"Id": "10000266",
"Text": "Stimulants/Energy Drinks Ready to Drink"
}, {
"Type": "GS2",
"Id": "10000266",
"Text": "Healthy Drink"
}]
}
},
Note: The json is a bit wierd to have the array wrapped by an object itself - this json was converted from a XML hence the result. So please assume I do not have any control over how this object is saved as json
You need to flatten the document in your query to get the result you want by joining the array back to the main document. The query you want would look like this:
SELECT items.id, items.Categories
FROM items
JOIN Category IN items.Categories.Category
WHERE CONTAINS(LOWER(Category.Text), "drink")
However, because there is no concept of a DISTINCT query, this will produce duplicates equal to the number of Category items that contain the word "drink". So this query would produce your example document twice like this:
[
{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Categories": {
"Category": [
{
"Type": "GS1",
"Id": "10000266",
"Text": "Stimulants/Energy Drinks Ready to Drink"
},
{
"Type": "GS2",
"Id": "10000266",
"Text": "Healthy Drink"
}
]
}
},
{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Categories": {
"Category": [
{
"Type": "GS1",
"Id": "10000266",
"Text": "Stimulants/Energy Drinks Ready to Drink"
},
{
"Type": "GS2",
"Id": "10000266",
"Text": "Healthy Drink"
}
]
}
}
]
This could be problematic and expensive if the Categories array holds a lot of Category items that have "drink" in them.
You can cut that down if you are only interested in a single Category by changing the query to:
SELECT items.id, Category
FROM items
JOIN Category IN items.Categories.Category
WHERE CONTAINS(LOWER(Category.Text), "drink")
Which would produce a more concise result with only the id field repeated with each matching Category item showing up once:
[{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Category": {
"Type": "GS1",
"Id": "10000266",
"Text": "Stimulants/Energy Drinks Ready to Drink"
}
},
{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Category": {
"Type": "GS2",
"Id": "10000266",
"Text": "Healthy Drink"
}
}]
Otherwise, you will have to filter the results when you get them back from the query to remove duplicate documents.
If it were me and I was building a production system with this requirement, I'd use Azure Search. Here is some info on hooking it up to DocumentDB.
If you don't want to do that and we must live with the constraint that you can't change the shape of the documents, the only way I can think to do this is to use a User Defined Function (UDF) like this:
function GetItemsWithMatchingCategories(categories, matchingString) {
if (Array.isArray(categories) && categories !== null) {
var lowerMatchingString = matchingString.toLowerCase();
for (var index = 0; index < categories.length; index++) {
var category = categories[index];
var categoryName = category.Text.toLowerCase();
if (categoryName.indexOf(lowerMatchingString) >= 0) {
return true;
}
}
}
}
Note, the code above was modified by the asker after actually trying it out so it's somewhat tested.
You would use it with a query like this:
SELECT * FROM items WHERE udf.GetItemsWithMatchingCategories(items.Categories, "drink")
Also, note that this will result in a full table scan (unless you can combine it with other criteria that can use an index) which may or may not meet your performance/RU limit constraints.

Resources