Document Db query filter for an attribute that contains an array - azure-cosmosdb

With the sample json shown below, am trying to retrieve all documents that contains atleast one category which is array object wrapped underneath Categories that has the text value 'drinks' with the following query but the returned result is empty. Can someone help me get this right?
SELECT items.id
,items.description
,items.Categories
FROM items
WHERE ARRAY_CONTAINS(items.Categories.Category.Text, "drink")
{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Categories": {
"Category": [{
"Type": "GS1",
"Id": "10000266",
"Text": "Stimulants/Energy Drinks Ready to Drink"
}, {
"Type": "GS2",
"Id": "10000266",
"Text": "Healthy Drink"
}]
}
},
Note: The json is a bit wierd to have the array wrapped by an object itself - this json was converted from a XML hence the result. So please assume I do not have any control over how this object is saved as json

You need to flatten the document in your query to get the result you want by joining the array back to the main document. The query you want would look like this:
SELECT items.id, items.Categories
FROM items
JOIN Category IN items.Categories.Category
WHERE CONTAINS(LOWER(Category.Text), "drink")
However, because there is no concept of a DISTINCT query, this will produce duplicates equal to the number of Category items that contain the word "drink". So this query would produce your example document twice like this:
[
{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Categories": {
"Category": [
{
"Type": "GS1",
"Id": "10000266",
"Text": "Stimulants/Energy Drinks Ready to Drink"
},
{
"Type": "GS2",
"Id": "10000266",
"Text": "Healthy Drink"
}
]
}
},
{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Categories": {
"Category": [
{
"Type": "GS1",
"Id": "10000266",
"Text": "Stimulants/Energy Drinks Ready to Drink"
},
{
"Type": "GS2",
"Id": "10000266",
"Text": "Healthy Drink"
}
]
}
}
]
This could be problematic and expensive if the Categories array holds a lot of Category items that have "drink" in them.
You can cut that down if you are only interested in a single Category by changing the query to:
SELECT items.id, Category
FROM items
JOIN Category IN items.Categories.Category
WHERE CONTAINS(LOWER(Category.Text), "drink")
Which would produce a more concise result with only the id field repeated with each matching Category item showing up once:
[{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Category": {
"Type": "GS1",
"Id": "10000266",
"Text": "Stimulants/Energy Drinks Ready to Drink"
}
},
{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Category": {
"Type": "GS2",
"Id": "10000266",
"Text": "Healthy Drink"
}
}]
Otherwise, you will have to filter the results when you get them back from the query to remove duplicate documents.

If it were me and I was building a production system with this requirement, I'd use Azure Search. Here is some info on hooking it up to DocumentDB.
If you don't want to do that and we must live with the constraint that you can't change the shape of the documents, the only way I can think to do this is to use a User Defined Function (UDF) like this:
function GetItemsWithMatchingCategories(categories, matchingString) {
if (Array.isArray(categories) && categories !== null) {
var lowerMatchingString = matchingString.toLowerCase();
for (var index = 0; index < categories.length; index++) {
var category = categories[index];
var categoryName = category.Text.toLowerCase();
if (categoryName.indexOf(lowerMatchingString) >= 0) {
return true;
}
}
}
}
Note, the code above was modified by the asker after actually trying it out so it's somewhat tested.
You would use it with a query like this:
SELECT * FROM items WHERE udf.GetItemsWithMatchingCategories(items.Categories, "drink")
Also, note that this will result in a full table scan (unless you can combine it with other criteria that can use an index) which may or may not meet your performance/RU limit constraints.

Related

Cosmos DB query syntax WHERE clause with array in array

The following json represents two documents in a Cosmos DB container.
How can I write a query that gets any document that has an item with an id of item_1 and value of bar.
I've looked into ARRAY_CONTAINS, but don't get this to work with array's in array's.
Als I've tried somethings with any. Although I can't seem to find any documentation on how to use this, any seems to be a valid function, as I do get formatting highlights in the cosmos db explorer in Azure Portal.
For the any function I tried things like SELECT * FROM c WHERE c.pages.any(p, p.items.any(i, i.id = "item_1" AND i.value = "bar")).
The id fields are unique so if it's easier to find any document that contains any object with the right id and value, that would be fine too.
[
{
"type": "form",
"id": "form_a",
"pages": [
{
"name": "Page 1",
"id": "page_1",
"items": [
{
"id": "item_1",
"value": "foo"
}
]
}
]
},
{
"type": "form",
"id": "form_b",
"pages": [
{
"name": "Page 1",
"id": "page_1",
"items": [
{
"id": "item_1",
"value": "bar"
}
]
}
]
}
]
I think join could handle with WHERE clause with array in array.Please test below sql:
SELECT c.id FROM c
join pages in c.pages
where array_contains(pages.items,{"id": "item_1","value": "bar"},true)
Output:

CosmosDB SQL to query "any" child field

Given a document structure like below, where "variants" has N id based sub-entries, I would like to filter on the inner "sku" field. Something akin to this:
SELECT * FROM c WHERE c.variants.?.sku = "some_sku_1"
Here "some_id_1" and "some_id_2" are id values, data driven, and cannot be part of the query.
Is this possible with Cosmos DB and if so, how?
{
"id": "45144",
"variants": {
"some_id_1": {
"sku": "some_sku_1",
"title": "some title 1"
},
"some_id_2": {
"sku": "some_sku_2",
"title": "some title 2"
}
}
}
You can't do that with that schema without using a UDF/SPROC, but if you change the schema slightly, you can do it.
Schema:
{
"id": "45144",
"variants": [
{
"id": "some_id_1",
"sku": "some_sku_1",
"title": "some title 1"
},
{
"id": "some_id_2"
"sku": "some_sku_2",
"title": "some title 2"
}
]
}
Query:
SELECT * FROM c IN Item.variants WHERE c.sku == "some_sku_1"
Check out this article to get a good idea of what's possible with that "IN' statement, which allows you to iterate over objects. https://learn.microsoft.com/en-us/azure/cosmos-db/sql-api-sql-query#Advanced

How to form inner SubQuery in Gremlin Server (Titan 1.0)?

I'm using Following Query :
g.V(741440).outE('Notification').order().by('PostedDateLong', decr).range(0,1).as('notificationInfo').match(
__.as('notificationInfo').inV().as('postInfo'),
).select('notificationInfo','postInfo')
it is giving following result :
{
"requestId": "9846447c-4217-4103-ac2e-de3536a3c62a",
"status": {
"message": "",
"code": ​200,
"attributes": { }
},
"result": {
"data": [
{
"notificationInfo": {
"id": "c0zs-fw3k-347p-g2g0",
"label": "Notification",
"type": "edge",
"inVLabel": "Comment",
"outVLabel": "User",
"inV": ​749664,
"outV": ​741440,
"properties": {
"ParentPostId": "823488",
"PostedDate": "2016-05-26T02:35:52.3889982Z",
"PostedDateLong": ​635998269523889982,
"Type": "CommentedOnPostNotification",
"NotificationInitiatedByVertexId": "1540312"
}
},
"postInfo": {
"id": ​749664,
"label": "Comment",
"type": "vertex",
"properties": {
"PostImage": [
{
"id": "amto-g2g0-2wat",
"value": ""
}
],
"PostedByUser": [
{
"id": "am18-g2g0-2txh",
"value": "orbitpage#gmail.com"
}
],
"PostedTime": [
{
"id": "amfg-g2g0-2upx",
"value": "2016-05-26T02:35:39.1489483Z"
}
],
"PostMessage": [
{
"id": "aln0-g2g0-2t51",
"value": "hi"
}
]
}
}
}
],
"meta": { }
}
}
I want to get information of Vertex "NotificationInitiatedByVertexId" (Edge Property ) in the response as well.
For that i tried following query :
g.V(741440).outE('Notification').order().by('PostedDateLong', decr).range(0,2).as('notificationInfo').match(
__.as('notificationInfo').inV().as('postInfo'),
g.V(1540312).next().as('notificationByUser')
).select('notificationInfo','postInfo','notificationByUser')
Note : I tried directly with vertex Id in subquery as I wasn't aware how to dynamically get value from edge property in query itself.
It is giving error. I tried a lot but am not able to find any solution.
I'm assuming that you are storing a Titan generated identifier in that edge property called NotificationInitiatedByVertexId. If so, please consider the following even though this first part doesn't really answer your question. I don't think you should store a vertex identifier on the edge. Your graph model should explicitly track the relationship of NotificationInitiatedBy with an edge and by storing the identifier of the vertex on the edge itself you are bypassing that. Also, if you ever have to migrate your data in some way, the ids won't be preserved (Titan will generate new ones) and trying to sort that out will be a mess.
Even if that is not a Titan generated identifier and a logical one you created, I still think I would look to adjust your graph schema and promote that Notification to a vertex. Then your Gremlin traversals would flow more easily.
Now, assuming you don't change that, then I don't see a reason to not just issue two queries in the same request and then combine the results to one data structure. You just need to do a lookup with the vertex id which is going to be pretty fast and inexpensive:
edgeStuff = g.V(741440).outE('Notification').
order().by('PostedDateLong', decr).range(0,1).as('notificationInfo').
... // whatever logic you have
select('notificationInfo','postInfo').next()
vertexStuff = g.V(edgeStuff.get('notificationInfo').value('NotificationInitiatedByVertexId')).next()
[notificationInitiatedBy: vertexStuff, notification: edgeStuff]

JSONAPI - Difference between self and related in a links resource

Why is the self and related references different in the below JSONAPI resource? Aren't they pointing to the same resource? What is the difference between going to /articles/1/relationships/tags and /articles/1/tags?
{
"links": {
"self": "/articles/1/relationships/tags",
"related": "/articles/1/tags"
},
"data": [
{ "type": "tags", "id": "2" },
{ "type": "tags", "id": "3" }
]
}
You can read about that here: https://github.com/json-api/json-api/issues/508.
Basically, with /articles/1/relationships/tags response will be object which represents relationship between articles and tags. The response could be something like this (what you put in your question):
{
"links": {
"self": "/articles/1/relationships/tags",
"related": "/articles/1/tags"
},
"data": [
{ "type": "tags", "id": "2" },
{ "type": "tags", "id": "3" }
]
}
This response gives only the necessary data (in primary data attribute - data) to manipulate the relationship and not resources connected with relationship. That being said, you'll call /articles/1/relationships/tags if you want to create new relationship, add a new tag (basically updating relationship) to article, read which tags belong to article (you only need identity to search them on server) or delete article tags.
On the other hand, calling /articles/1/tags will respond with tags as primary data with all the other properties that they have (articles, relationships, links, and other top-level attributes such include, emphasized text, links and/or jsonapi).
They are different. Here is an example from my project.
Try Get http://localhost:3000/phone-numbers/1/relationships/contact you will get response like this:
{
"links": {
"self": "http://localhost:3000/phone-numbers/1/relationships/contact",
"related": "http://localhost:3000/phone-numbers/1/contact"
},
"data": {
"type": "contacts",
"id": "1"
}
}
You didn't get the attributes and relationships which is probably you want to retrieve.
Then
Try Get http://localhost:3000/phone-numbers/1/contact you will get response like this:
{
"data": {
"id": "1",
"type": "contacts",
"links": {
"self": "http://localhost:3000/contacts/1"
},
"attributes": {
"name-first": "John",
"name-last": "Doe",
"email": "john.doe#boring.test",
"twitter": null
},
"relationships": {
"phone-numbers": {
"links": {
"self": "http://localhost:3000/contacts/1/relationships/phone-numbers",
"related": "http://localhost:3000/contacts/1/phone-numbers"
}
}
}
}
}
You can see you retrieved all the information you want, including the attributes and relationships.
But you should know that relationships can be used for some purpose. Please read http://jsonapi.org/format/#crud-updating-to-one-relationships as a sample.

In Freebase MQL, how can I write a query to get the name a organization's parent?

For a list of organizations I need to get their parents. In freebase.com's query editor, I am using the following query:
{
"id": "/en/daihatsu_motor_company",
"/organization/organization/parent":{id:null}
}​
And I am getting the following result:
{
"code": "/api/status/ok",
"result": {
"/organization/organization/parent": {
"id": "/m/04kjl82"
},
"id": "/en/daihatsu_motor_company"
},
"status": "200 OK",
"transaction_id": "cache;cache03.p01.sjc1:8101;2012-07-10T22:54:06Z;0030"
}
However, I am expecting the id: toyota_motor_corporation.
From freebase.com's query editor, I can click on the id ("id": "/m/04kjl82") which is a link to view with the info I need:
http://www.freebase.com/view/m/04kjl82
How can I get directly the name of the parent company or its id (in the example toyota_motor_corporation)?
Thanks,
Claudio's answer is close, but if you actually want the ID, you need to tweak the query slightly since the default property returned is the name, not ID. This will get you the ID:
{
"id": "/en/daihatsu_motor_company",
"/organization/organization/parent": {
"parent": {"id":null}
}
}​
which will return
"result": {
"/organization/organization/parent": {
"parent": {
"id": "/en/toyota_motor_corporation"
}
},
"id": "/en/daihatsu_motor_company"
}
Having said that, you should consider using the MID instead of the ID since that's the recommended identifier these days.
Your query is returning the id of the organization relationship between a Parent and a Child. What you want is the parent and you can get it with the following query:
{
"id": "/en/daihatsu_motor_company",
"/organization/organization/parent": {
"parent": null
}
}​
which returns
{
"code": "/api/status/ok",
"result": {
"/organization/organization/parent": {
"parent": "Toyota Motor Corporation"
},
"id": "/en/daihatsu_motor_company"
},
"status": "200 OK",
"transaction_id": "cache;cache03.p01.sjc1:8101;2012-07-11T21:50:01Z;0045"
}

Resources