How to query data in Cosmos db from nested json - azure-cosmosdb

I have some difficulty in writing a query to query data from nested json in Cosmos db.
Sample json -
{
"id": xyz
"items": [
{
"arr_id": 1,
"randomval": "abc"
},
{
"arr_id": 2,
"randomval": "mno"
},
{
"arr_id": 1,
"randomval": "xyz"
}
]
}
Lets say in above case, if i want to get all jsons data with arr_id = 1.
Expected Result -
{
"id": xyz
"items": [
{
"arr_id": 1,
"randomval": "abc"
},
{
"arr_id": 1,
"randomval": "xyz"
}
]
}
If i write a query like below, it still gives me entire json.
Select * from c where ARRAY_CONTAINS(c.items, {"arr_id": 1},true)
I want it to filter it items level too. I guess it just filters at header level and provides entire json where even a single arr_id matches.

You can use either
SELECT c.id, ARRAY(SELECT VALUE i FROM i in c.items where i.arr_id = 1) as items
FROM c
WHERE EXISTS(SELECT VALUE i FROM i in c.items where i.arr_id = 1)
or
SELECT c.id, ARRAY(SELECT VALUE i FROM i in c.items where i.arr_id = 1) as items
FROM c
depending on whether you expect an empty array if no arrayItem with arr_id=1 exists or you wnat to filter out those records compeletely.
Also see this link for a good overview of query options across arrays - https://devblogs.microsoft.com/cosmosdb/understanding-how-to-query-arrays-in-azure-cosmos-db/

Related

What's the use case for a simple JOIN without "IN" in Cosmos DB SQL?

I'm trying to understand the documentation about JOIN in Cosmos DB SQL.
In the sample JSON, each family object has a children property, like this:
First family object:
"id": "AndersenFamily",
"lastName": "Andersen",
"children": [
{
"firstName": "Henriette Thaulow",
"grade": 5
}
],
Second family object:
"id": "WakefieldFamily",
"children": [
{
"givenName": "Jesse"
}
],
Then, this JOIN operation is shown:
SELECT f.id
FROM Families f
JOIN f.children
The obvious result is the identifier of each family object.
[
{
"id": "AndersenFamily"
},
{
"id": "WakefieldFamily"
}
]
If the "JOIN" is removed, the result is exactly the same. Even I wanted to project the children, I could just use SELECT f.id, f.children and there would be no reason to join f.children.
The only difference I observe is if a family object didn't have a children property. Then the joining on f.children would exclude the family object from the results.
So what is the point of a JOIN in Cosmos DB SQL without combining it with IN? Is there any real use cases for it?

Union Results of Multiple OPENJSON calls

I have a table that stores 1 json object per row, I want to call OPENJSON on the json object stored in each row and union together all of the results. I don't know the number of rows I will have ahead of time.
Here is some example data to reference
DROP TABLE #tmp_json_tbl
DECLARE #json1 NVARCHAR(2048) = N'{
"members": [
{
"name": "bob",
"status": "subscribed"
},
{
"name": "larry",
"status": "unsubscribed"
}
]
}';
SELECT #json1 as json_obj,1 as jid into #tmp_json_tbl
INSERT into #tmp_json_tbl
VALUES ( N'{
"members": [
{
"name": "bob",
"status": "subscribed"
},
{
"name": "larry",
"status": "unsubscribed"
}
]
}',2 );
SELECT * from #tmp_json_tbl
--how can i recursively union together results for all values of jid?
-- I could use a cursor but I would rather figure out a way to do it using a recursive cte
SELECT * FROM OpenJson((SELECT json_obj from #tmp_json_tbl where jid=1), '$.members')
WITH (
name VARCHAR(80) '$.name',
mstatus varchar(100) '$.status'
)```
This is what I wanted
SELECT name, m_status
FROM #tmp_json
CROSS APPLY OPENJSON(j, '$.members')
WITH (
name VARCHAR(80) '$.name',
m_status varchar(100) '$.status'
)
Found my answer here: How to use OPENJSON on multiple rows

Cosmos DB SQL on nested array without property name

Assume documents of the following schema
{
"id": 1,
"children":[
{"id": "a", "name":"al"},
{"id": "b", "name":"bob"}
]
}
I want to return an array of arrays of all children but filtered on the id property at the root level. Below are the most the known alternatives and limitations:
SELECT * FROM c.children
The above SQL, provides the array of arrays in the right shape but it doens't allow me to filter at the ID in the ROOT level of the document.
SELECT children FROM c WHERE c.id >= 1
The above allows the filtering but returns an array of objects all with the "children" property containing the array.
SELECT child.id, child.name FROM c JOIN child in c.children WHERE c.id >= 1
The above allows the filtering but returns an array of objects. Unlike the previous example the objects are flattened to the child level e.g. property named prefix of "children" is not present.
Again the ordering and grouping children in the returned arrays returned are important on the client side, thus the desired to return all children of a parent grouped in to an array. The first query accomplishes that be doesn't allow filtering.
Please try this SQL:
SELECT value c.children FROM c WHERE c.id >= 1
Result:
[
[
{
"id": "a",
"name": "al"
},
{
"id": "b",
"name": "bob"
}
]
]

Limit number of documents in a partition for cosmosdb

I have a cosmosdb collection with each partition containing a set of documents.I would like to maintain the collection such that a logical partition('id' in this case) does not go over the limit of 5 documents. In the sample below, when a sixth entry(say for 8/11/2020) is added, I want to delete the document created on 7/13/2020 since that was updated the earliest.
Basically, I want to make sure for item with id 12345, there are only 5 latest entries and no more. This is to reduce the data in the db and thus avoid querying more data than what's needed.
{
"id": 12345,
"lastUpdated": 8/10/2020
},
{
"id": 12345,
"lastUpdated": 8/3/2020
},
{
"id": 12345,
"lastUpdated": 7/27/2020
},
{
"id": 12345,
"lastUpdated": 7/20/2020
},
{
"id": 12345,
"lastUpdated": 7/13/2020
}
I could do something like this:
Get all documents for id 12345
If count of documents is >=5, get the last document (with instance 5) and delete it.
Insert new document
However, that is a running 3 queries to insert a single document.
Is there a more elegant way to do this?
Thanks!
You can use OFFSET 1 LIMIT 5 to get 5 latest entries. For more details, you can read offical document about OFFSET LIMIT clause in Azure Cosmos DB.
You can get the count(assume 100) of data and set ttl, or delete directly. We can query like below.
SELECT f.id, f.lastUpdated FROM yourcosmosdb f ORDER BY f.lastUpdated OFFSET 6 LIMIT 100
Foreach
List<Task> concurrentDeleteTasks = new List<Task>();
while (feedIterator.HasMoreResults)
{
FeedResponse<response> res = await feedIterator.ReadNextAsync();
foreach (var item in res)
{
concurrentDeleteTasks.Add(container.DeleteItemAsync<response>(item.id, new PartitionKey(item.deviceid)));
}
}
await Task.WhenAll(concurrentDeleteTasks.Take(3));
You also can foreach the collection and set ttl=10, these data will be deleted 10s later.
You can get the latest 5 data:
SELECT f.id, f.lastUpdated FROM yourcosmosdb f ORDER BY f.lastUpdated OFFSET 1 LIMIT 5

Return the content of a specific object in an array — CosmosDB

This is a follow up to question 56126817
My current query
SELECT c.EventType.EndDeviceEventDetail FROM c
WHERE c.EventType.EndDeviceEventType.eventOrAction = '93'
AND c.EventType.EndDeviceEventType.subdomain = '137'
AND c.EventType.EndDeviceEventType.domain = '26'
AND c.EventType.EndDeviceEventType.type = '3'
AND ARRAY_CONTAINS(c.EventType.EndDeviceEventDetail,{"name":
"RCDSwitchReleased","value": "true" })
My Query Output
[
{
"EndDeviceEventDetail": [
{
"name": "Spontaneous",
"value": "true"
},
{
"name": "DetectionActive",
"value": "true"
},
{
"name": "RCDSwitchReleased",
"value": "true"
}
]
}
]
Question
How could change my query so that I select only the "value" of the array that contains the "name" "DetectionActive" ?
The idea behind is to filter the query on one array entry and get as output the "value" of another array entry. From reading here, UDF (not the best in this case) and JOIN should be used.
First attempt
SELECT t.value FROM c JOIN t in c.EventType.EndDeviceEventDetail
WHERE c.EventType.EndDeviceEventType.eventOrAction = '93'
AND c.EventType.EndDeviceEventType.subdomain = '137'
AND c.EventType.EndDeviceEventType.domain = '26'
AND c.EventType.EndDeviceEventType.type = '3'
AND ARRAY_CONTAINS(c.EventType.EndDeviceEventDetail,{"name":
"RCDSwitchReleased","value": "true" })
Gets Bad Request (400) error
Your idea and direction is right absolutely, I simplified and tested your sql.
SELECT detail.value FROM c
join detail in c.EventType.EndDeviceEventDetail
WHERE c.EventType.EndDeviceEventType.eventOrAction = '93'
AND ARRAY_CONTAINS(c.EventType.EndDeviceEventDetail,{"name":
"RCDSwitchReleased","value": "true" })
Found the error message as below:
It because that the value is the reserved word in cosmos db sql syntax,please refer to this case:Using reserved word field name in DocumentDB
You could try to modify the sql like:
SELECT detail["value"] FROM c

Resources