Cosmos DB query syntax WHERE clause with array in array - azure-cosmosdb

The following json represents two documents in a Cosmos DB container.
How can I write a query that gets any document that has an item with an id of item_1 and value of bar.
I've looked into ARRAY_CONTAINS, but don't get this to work with array's in array's.
Als I've tried somethings with any. Although I can't seem to find any documentation on how to use this, any seems to be a valid function, as I do get formatting highlights in the cosmos db explorer in Azure Portal.
For the any function I tried things like SELECT * FROM c WHERE c.pages.any(p, p.items.any(i, i.id = "item_1" AND i.value = "bar")).
The id fields are unique so if it's easier to find any document that contains any object with the right id and value, that would be fine too.
[
{
"type": "form",
"id": "form_a",
"pages": [
{
"name": "Page 1",
"id": "page_1",
"items": [
{
"id": "item_1",
"value": "foo"
}
]
}
]
},
{
"type": "form",
"id": "form_b",
"pages": [
{
"name": "Page 1",
"id": "page_1",
"items": [
{
"id": "item_1",
"value": "bar"
}
]
}
]
}
]

I think join could handle with WHERE clause with array in array.Please test below sql:
SELECT c.id FROM c
join pages in c.pages
where array_contains(pages.items,{"id": "item_1","value": "bar"},true)
Output:

Related

How to query Cosmos DB to have an array from multiple items in the result set

I have the following content in a container, where device_id is the partition key.
[
{
"id": "hub-01",
"device_id": "device-01",
"created": "2020-12-08T17:47:35",
"cohort": "test"
},
{
"id": "hub-02",
"device_id": "device-01",
"created": "2020-12-08T17:47:36",
"cohort": "test"
},
{
"id": "hub-01",
"device_id": "device-02",
"created": "2020-11-17T20:25:20",
"cohort": "test"
},
{
"id": "hub-01",
"device_id": "device-03",
"created": "2020-11-17T16:05:18",
"cohort": "test"
}
]
How do I query all unique devices, with all their metadata collected into a sub-list, so I get the following result set:
[
{
"device_id": "device-01",
"hubs": [
{
"id": "hub-01",
"created": "2020-12-08T17:47:35",
"cohort": "test"
},
{
"id": "hub-02",
"created": "2020-12-08T17:47:36",
"cohort": "test"
}
]
},
{
"device_id": "device-02",
"hubs": [
{
"id": "hub-01",
"created": "2020-11-17T20:25:20",
"cohort": "test"
}
]
},
{
"device_id": "device-03",
"hubs": [
{
"id": "hub-01",
"created": "2020-11-17T16:05:18",
"cohort": "test"
}
]
}
]
I was experimenting along the lines of the following sub-query, but it does not behave as I would expect:
SELECT
DISTINCT c.device_id,
ARRAY(
SELECT
c2.id,
c2.created,
c2.cohort
FROM c AS c2
WHERE c2.device_id = c.device_id
) as hubs
FROM c
You can create UDF function to handle this.
Here is a similar question I answered from another post.
group data by same timestamp using cosmos db sql
I agree with Mo B. You need to deal with this on your client side. I don't think UDF function can handle this because UDF function can't combine multiple items to one. I think the closest SQL like this:
SELECT
c2.device_id,ARRAY_CONCAT([],c2.hubs)
FROM
(SELECT c.device_id,ARRAY(
SELECT
c.id,
c.created,
c.cohort
FROM c
) as hubs FROM c) as c2
GROUP BY c2.device_id
But ARRAY_CONCAT isn't Aggregate function and there is no Aggregate function can concat array.

Azure CosmosDB SQL query on nested Array

We have a below document structure.
{
"id":"GUID",
"customer": {
"contacts": [
{
"type": "MOBILE",
"status": "CONFIRMED",
"value": "xxxx"
},
{
"type": "EMAIL",
"status": "CONFIRMED",
"value": "aaaa"
}
],
"addresses": [
{
"country": "xxx"
}
]
}
}
and need to search for customer->contacts where value="aaaa".
I tried with below options
1) SELECT c.id FROM c
join customer in c.customer
join contacts in c.customer.contacts
where contacts.value = "aaaa"
2) SELECT c.id FROM c WHERE c.customer.contacts[0].value= "aaaa"
Getting Syntax error 400 bad request Any help highly appreciated
Part of your issue is that value cannot be searched as easily as other properties. Here are possible solutions:
SELECT c.id FROM c
join contacts in c.customer.contacts where contacts["value"] = "aaaa"
SELECT c.id FROM c WHERE c.customer.contacts[1]["value"] = "aaaa"
Document DB SQL Api - unable to query json property with name 'value' and its value is integer

CosmosDB SQL to query "any" child field

Given a document structure like below, where "variants" has N id based sub-entries, I would like to filter on the inner "sku" field. Something akin to this:
SELECT * FROM c WHERE c.variants.?.sku = "some_sku_1"
Here "some_id_1" and "some_id_2" are id values, data driven, and cannot be part of the query.
Is this possible with Cosmos DB and if so, how?
{
"id": "45144",
"variants": {
"some_id_1": {
"sku": "some_sku_1",
"title": "some title 1"
},
"some_id_2": {
"sku": "some_sku_2",
"title": "some title 2"
}
}
}
You can't do that with that schema without using a UDF/SPROC, but if you change the schema slightly, you can do it.
Schema:
{
"id": "45144",
"variants": [
{
"id": "some_id_1",
"sku": "some_sku_1",
"title": "some title 1"
},
{
"id": "some_id_2"
"sku": "some_sku_2",
"title": "some title 2"
}
]
}
Query:
SELECT * FROM c IN Item.variants WHERE c.sku == "some_sku_1"
Check out this article to get a good idea of what's possible with that "IN' statement, which allows you to iterate over objects. https://learn.microsoft.com/en-us/azure/cosmos-db/sql-api-sql-query#Advanced

How can I write a document db query which returns results of sub-documents satisifying any conditions of input list?

Let's assume that I have the following document:
[
{
"name" : "obj1",
"field": [ "Foo1", "Foo3" ]
},
{
"name": "obj2",
"field": [ "Foo2" ]
},
{
"name": "obj3",
"field": [ "Foo3" ]
},
{
"name": "obj4",
"field": [ "Foo1" ]
}
]
I want to write a query which returns obj1, obj3, and obj4 when field = "Foo1" or "Foo3" are searched for. Obviously I can write something like:
SELECT * FROM c WHERE ARRAY_CONTAINS(c.field, "Foo1") OR ARRAY_CONTAINS(c.field, "Foo3")
Though I want to avoid constructing a long query by concatenating query string with ARRAY_CONTAINS for each value in search list.
How can this query be expressed succinctly?
You could rewrite the query using JOIN as follows:
SELECT c
FROM c
JOIN tag IN c.field
WHERE ARRAY_CONTAINS(["Foo1", "Foo3"], tag)
Note that if you have an object with both tags, then it would occur multiple times in the result, and you have to perform distinct/de-duping on the client side.

Document Db query filter for an attribute that contains an array

With the sample json shown below, am trying to retrieve all documents that contains atleast one category which is array object wrapped underneath Categories that has the text value 'drinks' with the following query but the returned result is empty. Can someone help me get this right?
SELECT items.id
,items.description
,items.Categories
FROM items
WHERE ARRAY_CONTAINS(items.Categories.Category.Text, "drink")
{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Categories": {
"Category": [{
"Type": "GS1",
"Id": "10000266",
"Text": "Stimulants/Energy Drinks Ready to Drink"
}, {
"Type": "GS2",
"Id": "10000266",
"Text": "Healthy Drink"
}]
}
},
Note: The json is a bit wierd to have the array wrapped by an object itself - this json was converted from a XML hence the result. So please assume I do not have any control over how this object is saved as json
You need to flatten the document in your query to get the result you want by joining the array back to the main document. The query you want would look like this:
SELECT items.id, items.Categories
FROM items
JOIN Category IN items.Categories.Category
WHERE CONTAINS(LOWER(Category.Text), "drink")
However, because there is no concept of a DISTINCT query, this will produce duplicates equal to the number of Category items that contain the word "drink". So this query would produce your example document twice like this:
[
{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Categories": {
"Category": [
{
"Type": "GS1",
"Id": "10000266",
"Text": "Stimulants/Energy Drinks Ready to Drink"
},
{
"Type": "GS2",
"Id": "10000266",
"Text": "Healthy Drink"
}
]
}
},
{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Categories": {
"Category": [
{
"Type": "GS1",
"Id": "10000266",
"Text": "Stimulants/Energy Drinks Ready to Drink"
},
{
"Type": "GS2",
"Id": "10000266",
"Text": "Healthy Drink"
}
]
}
}
]
This could be problematic and expensive if the Categories array holds a lot of Category items that have "drink" in them.
You can cut that down if you are only interested in a single Category by changing the query to:
SELECT items.id, Category
FROM items
JOIN Category IN items.Categories.Category
WHERE CONTAINS(LOWER(Category.Text), "drink")
Which would produce a more concise result with only the id field repeated with each matching Category item showing up once:
[{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Category": {
"Type": "GS1",
"Id": "10000266",
"Text": "Stimulants/Energy Drinks Ready to Drink"
}
},
{
"id": "1dbaf1d0-6549-11a0-88a8-001256957023",
"Category": {
"Type": "GS2",
"Id": "10000266",
"Text": "Healthy Drink"
}
}]
Otherwise, you will have to filter the results when you get them back from the query to remove duplicate documents.
If it were me and I was building a production system with this requirement, I'd use Azure Search. Here is some info on hooking it up to DocumentDB.
If you don't want to do that and we must live with the constraint that you can't change the shape of the documents, the only way I can think to do this is to use a User Defined Function (UDF) like this:
function GetItemsWithMatchingCategories(categories, matchingString) {
if (Array.isArray(categories) && categories !== null) {
var lowerMatchingString = matchingString.toLowerCase();
for (var index = 0; index < categories.length; index++) {
var category = categories[index];
var categoryName = category.Text.toLowerCase();
if (categoryName.indexOf(lowerMatchingString) >= 0) {
return true;
}
}
}
}
Note, the code above was modified by the asker after actually trying it out so it's somewhat tested.
You would use it with a query like this:
SELECT * FROM items WHERE udf.GetItemsWithMatchingCategories(items.Categories, "drink")
Also, note that this will result in a full table scan (unless you can combine it with other criteria that can use an index) which may or may not meet your performance/RU limit constraints.

Resources