CosmosDb querying in sub-object - azure-cosmosdb

I'm trying to find accounts with a "extra-storage" premium package for a particular user in my Azure CosmosDb database. Here's what my Account object looks like:
{
"id": 123,
"name": "My Account",
"members": [
{
"id": 333,
"name": "John Smith"
},
{
"id": 555,
"name": "Jane Doe"
}
],
"subscription": {
"type": "great-value",
"startDate": "2022-04-21T16:38:00.0000000Z",
"premiumPackages": [
{
"type": "extra-storage",
"status": "active"
},
{
"type": "video-encoding",
"status": "cancelled"
}
]
}
}
So, my conditions for the query (in-English) are:
Account must contain "John Smith" (id: 333) as a member
It must have the "extra-storage" premium package in its subscription
I'm not sure if I can have multiple JOINs but here's what I've tried with no results so far:
SELECT c.id, c.name, s.premiumPackages.status
FROM c JOIN m IN c.members
JOIN s IN c.subscription
WHERE CONTAINS(m.id, 333)
AND CONTAINS(s.premiumPackages.type, "extra-storage")
Any idea how I can get accounts with "extra-storage" package for "John Smith"?

This query should give you what you are looking for.
SELECT c.id, c.name, premiumPackages.status
FROM c
JOIN (SELECT VALUE m FROM m IN c.members WHERE m.id = 333)
JOIN (SELECT VALUE s FROM s IN c.subscription.premiumPackages WHERE s.type
= "extra-storage") AS premiumPackages
This blog post on Understanding how to query arrays in Azure Cosmos DB is helpful to keep bookmarked when trying to write queries for arrays.
PS, id on the root in Cosmos DB must be a string so your "id": 123 should be "id": "123".

Related

Getting user info using JOINs in Cosmos DB

The following is my Company object that I store in Cosmos DB. I have all the essential information about employees in Employees property. I also have a Departments property that both defines departments as well as its members.
{
"id": "company-a",
"name": "Company A",
"employees": [
{
"id": "john-smith",
"name": "John Smith",
"email": "john.smith#example.com"
},
{
"id": "jane-doe",
"name": "Jane Doe",
"email": "jane.doe#example.com"
},
{
"id": "brian-green",
"name": "Brian Green",
"email": "brian.green#example.com"
}
],
"departments": [
{
"id": "it",
"name": "IT Department",
"members": [
{
"id": "john-smith",
"name": "John Smith",
"isDepartmentHead": true
},
{
"id": "brian-green",
"name": "Brian Green",
"isDepartmentHead": false
}
]
},
{
"id": "hr",
"name": "HR Department",
"members": [
{
"id": "jane-doe",
"name": "Jane Doe",
"isDepartmentHead": true
}
]
}
]
}
I'm trying to return a list of a particular department, including the employee's email which will come from employees property.
Here's what I did but this is including all employees in the output:
SELECT dm.id, dm.name, e.email, em.isDepartmentHead
FROM Companies c
JOIN d IN c.departments
JOIN dm IN d.members
JOIN e IN c.Employees
WHERE c.id = "company-a" AND d.id = "hr"
The correct output would be:
[
{
"id": "jane-doe",
"name": "Jane Doe",
"email": "jane.doe#example.com",
"isDepartmentHead": true
}
]
How do I form my SQL statement to get all members of a department AND include employees' email addresses?
I'm pretty sure you cannot write a query like this. You are trying to correlate data twice in the same query across two arrays which I don't think is possible. (at least I've never been successful doing this).
Even if this was possible though, there are other issues with your data model. This data model will not scale. You also need to avoid unbounded or very large arrays within documents (e.g. employees and departments). You also do not want to store unrelated data in the same document. Objective here is to model data for high concurrency operations in the way you use it. This reduces both latency and cost.
There are many ways in which you can remodel this data. But if this is a very small data set, you could probably do something like this below with a partition key of companyId (assuming that you always query within a single company). This will store all employees for one company in the same logical partition which can store up to 20GB of data. I would also model this such that one document stores data specific to the company itself (address, phone number, number of employess, etc) with the id and companyId having the same value. This lets you do things like store materialized aggregates like # of employees and update it in a transaction. Also, since this approach mixes different types of entities (a bonus for NoSQL database, you need a discriminator property that allows you to filter for specific entities within the container so you can deserialize them directly into your model classes.
Here is a data model you could try (please note, you need to determine if this works for you by scaling it up to the amount of data you believe you will need to store. You also need to test and measure the RU/s cost for the CRUD operations you will execute with high concurrency).
Example company document:
{
"id": "aaaaa",
"companyId": "aaaaa",
"companyName": "Company A",
"type": "company",
"numberOfEmployees: 3,
"addresses": [
{
"address1": "123 Main Street",
"address2": "",
"city": "Los Angeles",
"state": "California",
"zip": "92568"
}
]
}
Then an employee document like this:
{
"id": "jane-doe",
"companyId": "aaaaa",
"type": "employee",
"employeeId": "jane-doe",
"employeeName": "Jane Doe",
"employeeEmail": "jane.doe#example.com",
"departmentId": "hr",
"departmentName": "HR Department",
"isDepartmentHead": true
}
Then last, here's the query to get the data you need.
SELECT
c.employeeId,
c.employeeName,
c.employeeEmail,
c.IsDepartmentHead
FROM c
WHERE
c.companyId = "company-a" AND
c.type = "employee" AND
c.departmentId = "hr"

Cosmos DB query syntax WHERE clause with array in array

The following json represents two documents in a Cosmos DB container.
How can I write a query that gets any document that has an item with an id of item_1 and value of bar.
I've looked into ARRAY_CONTAINS, but don't get this to work with array's in array's.
Als I've tried somethings with any. Although I can't seem to find any documentation on how to use this, any seems to be a valid function, as I do get formatting highlights in the cosmos db explorer in Azure Portal.
For the any function I tried things like SELECT * FROM c WHERE c.pages.any(p, p.items.any(i, i.id = "item_1" AND i.value = "bar")).
The id fields are unique so if it's easier to find any document that contains any object with the right id and value, that would be fine too.
[
{
"type": "form",
"id": "form_a",
"pages": [
{
"name": "Page 1",
"id": "page_1",
"items": [
{
"id": "item_1",
"value": "foo"
}
]
}
]
},
{
"type": "form",
"id": "form_b",
"pages": [
{
"name": "Page 1",
"id": "page_1",
"items": [
{
"id": "item_1",
"value": "bar"
}
]
}
]
}
]
I think join could handle with WHERE clause with array in array.Please test below sql:
SELECT c.id FROM c
join pages in c.pages
where array_contains(pages.items,{"id": "item_1","value": "bar"},true)
Output:

Azure CosmosDB SQL query on nested Array

We have a below document structure.
{
"id":"GUID",
"customer": {
"contacts": [
{
"type": "MOBILE",
"status": "CONFIRMED",
"value": "xxxx"
},
{
"type": "EMAIL",
"status": "CONFIRMED",
"value": "aaaa"
}
],
"addresses": [
{
"country": "xxx"
}
]
}
}
and need to search for customer->contacts where value="aaaa".
I tried with below options
1) SELECT c.id FROM c
join customer in c.customer
join contacts in c.customer.contacts
where contacts.value = "aaaa"
2) SELECT c.id FROM c WHERE c.customer.contacts[0].value= "aaaa"
Getting Syntax error 400 bad request Any help highly appreciated
Part of your issue is that value cannot be searched as easily as other properties. Here are possible solutions:
SELECT c.id FROM c
join contacts in c.customer.contacts where contacts["value"] = "aaaa"
SELECT c.id FROM c WHERE c.customer.contacts[1]["value"] = "aaaa"
Document DB SQL Api - unable to query json property with name 'value' and its value is integer

Azure search service index pointing multiple document db collections

How to load data from two separate collections of azure cosmos db to a single azure search index? I need a solution to join the data from two collections in a way similar to inner joining concept of SQL and load that data to azure search service.
I have two collections in azure cosmos db.
One for product and sample documents for the same is as below.
{
"description": null,
"links": [],
"replaces": "00000000-0000-0000-0000-000000000000",
"replacedBy": "00000000-0000-0000-0000-000000000000",
"productTypeId": "ccd0bc73-c4a1-41bf-9c96-454a5ba1d025",
"id": "a4853bf5-9c58-4fb5-a1ff-fc3ab575b4c8",
"name": "New Product",
"createDate": "2018-09-19T10:04:35.1951552Z",
"createdBy": "00000000-0000-0000-0000-000000000000",
"updateDate": "2018-10-05T13:46:24.7048358Z",
"updatedBy": "DIJdyXMudaqeAdsw1SiNyJKRIi7Ktio5#clients"
}
{
"description": null,
"links": [],
"replaces": "00000000-0000-0000-0000-000000000000",
"replacedBy": "00000000-0000-0000-0000-000000000000",
"productTypeId": "ccd0bc73-c4a1-41bf-9c96-454a5ba1d025",
"id": "b9b6c3bc-a8f8-470f-ac93-be589eb1da16",
"name": "New Product 2",
"createDate": "2018-09-19T11:02:02.6919008Z",
"createdBy": "00000000-0000-0000-0000-000000000000",
"updateDate": "2018-09-19T11:02:02.6919008Z",
"updatedBy": "00000000-0000-0000-0000-000000000000"
}
{
"description": null,
"links": [],
"replaces": "00000000-0000-0000-0000-000000000000",
"replacedBy": "00000000-0000-0000-0000-000000000000",
"productTypeId": "ccd0bc73-c4a1-41bf-9c96-454a5ba1d025",
"id": "98b3647a-3b40-4a00-bd0f-2a397bd48b68",
"name": "New Product 7",
"createDate": "2018-09-20T09:42:28.2913567Z",
"createdBy": "00000000-0000-0000-0000-000000000000",
"updateDate": "2018-09-20T09:42:28.2913567Z",
"updatedBy": "00000000-0000-0000-0000-000000000000"
}
Another collection for ProductType with below sample document.
{
"description": null,
"links": null,
"replaces": "00000000-0000-0000-0000-000000000000",
"replacedBy": "00000000-0000-0000-0000-000000000000",
"id": "ccd0bc73-c4a1-41bf-9c96-454a5ba1d025",
"name": "ProductType1_186",
"createDate": "2018-09-18T23:54:43.9395245Z",
"createdBy": "00000000-0000-0000-0000-000000000000",
"updateDate": "2018-10-05T13:29:44.019851Z",
"updatedBy": "DIJdyXMudaqeAdsw1SiNyJKRIi7Ktio5#clients"
}
The product type id is referred in product collection and that is the column which links both the collections.
I want to load the above two collections to the same azure search service index and I expect my field of index to be populated somewhat like below.
If you use product id as the key, you can simply point two indexers at the same index, and Azure Search will merge the documents automatically. For example, here are two indexer definitions that would merge their data into the same index:
{
"name" : "productIndexer",
"dataSourceName" : "productDataSource",
"targetIndexName" : "combinedIndex",
"schedule" : { "interval" : "PT2H" }
}
{
"name" : "sampleIndexer",
"dataSourceName" : "sampleDataSource",
"targetIndexName" : "combinedIndex",
"schedule" : { "interval" : "PT2H" }
}
Learn more about the create indexer api here
However, it appears that the two collections share the same fields. This means that the fields from the document which gets indexed last will replace the fields from the document that got indexed first. To avoid this, I would recommend replacing the fields that match the 00000000-0000-0000-0000-000000000000 pattern with null in your Cosmos DB query. For example:
SELECT productTypeId, (createdBy != "00000000-0000-0000-0000-000000000000" ? createdBy : null) as createdBy FROM products
This exact query may not work for your use case. See the query syntax reference for more information.
Please let me know if you have any questions, or something is not working as expected.
Thanks
Matt

WHERE clause on an array in Azure DocumentDb

In an Azure Documentdb document like this
{
"id": "WakefieldFamily",
"parents": [
{ "familyName": "Wakefield", "givenName": "Robin" },
{ "familyName": "Miller", "givenName": "Ben" }
],
"children": [
{
"familyName": "Merriam",
"givenName": "Jesse",
"gender": "female",
"grade": 1,
"pets": [
{ "givenName": "Goofy" },
{ "givenName": "Shadow" }
]
},
{
"familyName": "Miller",
"givenName": "Lisa",
"gender": "female",
"grade": 8
}
],
"address": { "state": "NY", "county": "Manhattan", "city": "NY" },
"isRegistered": false
};
How do I query to get children whose pets given name is "Goofy" ?
Looks like the following syntax is invalid
Select * from root r
WHERE r.children.pets.givenName="Goofy"
Instead I need to do
Select * from root r
WHERE r.children[0].pets[0].givenName="Goofy"
which is not really searching through an array.
Any suggestion on how I should handle queries like these ?
You should take advantage of DocumentDB's JOIN clause, which operates a bit differently than JOIN in RDBMs (since DocumentDB deals w/ denormlaized data model of schema-free documents).
To put it simply, you can think of DocumentDB's JOIN as self-joins which can be used to form cross-products between nested JSON objects.
In the context of querying children whose pets given name is "Goofy", you can try:
SELECT
f.id AS familyName,
c AS child,
p.givenName AS petName
FROM Families f
JOIN c IN f.children
JOIN p IN c.pets
WHERE p.givenName = "Goofy"
Which returns:
[{
familyName: WakefieldFamily,
child: {
familyName: Merriam,
givenName: Jesse,
gender: female,
grade: 1,
pets: [{
givenName: Goofy
}, {
givenName: Shadow
}]
},
petName: Goofy
}]
Reference: http://azure.microsoft.com/en-us/documentation/articles/documentdb-sql-query/
Edit:
You can also use the ARRAY_CONTAINS function, which looks something like this:
SELECT food.id, food.description, food.tags
FROM food
WHERE food.id = "09052" or ARRAY_CONTAINS(food.tags.name, "blueberries")
I think the ARRAY_CONTAINS function has changed since this was answered in 2014. I had to use the following for it to work.
SELECT * FROM c
WHERE ARRAY_CONTAINS(c.Samples, {"TimeBasis":"5MIN_AV", "Value":"5.105"},true)
Samples is my JSON array and it contains objects with many properties including the two above.

Resources