Limit number of documents in a partition for cosmosdb - azure-cosmosdb

I have a cosmosdb collection with each partition containing a set of documents.I would like to maintain the collection such that a logical partition('id' in this case) does not go over the limit of 5 documents. In the sample below, when a sixth entry(say for 8/11/2020) is added, I want to delete the document created on 7/13/2020 since that was updated the earliest.
Basically, I want to make sure for item with id 12345, there are only 5 latest entries and no more. This is to reduce the data in the db and thus avoid querying more data than what's needed.
{
"id": 12345,
"lastUpdated": 8/10/2020
},
{
"id": 12345,
"lastUpdated": 8/3/2020
},
{
"id": 12345,
"lastUpdated": 7/27/2020
},
{
"id": 12345,
"lastUpdated": 7/20/2020
},
{
"id": 12345,
"lastUpdated": 7/13/2020
}
I could do something like this:
Get all documents for id 12345
If count of documents is >=5, get the last document (with instance 5) and delete it.
Insert new document
However, that is a running 3 queries to insert a single document.
Is there a more elegant way to do this?
Thanks!

You can use OFFSET 1 LIMIT 5 to get 5 latest entries. For more details, you can read offical document about OFFSET LIMIT clause in Azure Cosmos DB.
You can get the count(assume 100) of data and set ttl, or delete directly. We can query like below.
SELECT f.id, f.lastUpdated FROM yourcosmosdb f ORDER BY f.lastUpdated OFFSET 6 LIMIT 100
Foreach
List<Task> concurrentDeleteTasks = new List<Task>();
while (feedIterator.HasMoreResults)
{
FeedResponse<response> res = await feedIterator.ReadNextAsync();
foreach (var item in res)
{
concurrentDeleteTasks.Add(container.DeleteItemAsync<response>(item.id, new PartitionKey(item.deviceid)));
}
}
await Task.WhenAll(concurrentDeleteTasks.Take(3));
You also can foreach the collection and set ttl=10, these data will be deleted 10s later.
You can get the latest 5 data:
SELECT f.id, f.lastUpdated FROM yourcosmosdb f ORDER BY f.lastUpdated OFFSET 1 LIMIT 5

Related

How to query data in Cosmos db from nested json

I have some difficulty in writing a query to query data from nested json in Cosmos db.
Sample json -
{
"id": xyz
"items": [
{
"arr_id": 1,
"randomval": "abc"
},
{
"arr_id": 2,
"randomval": "mno"
},
{
"arr_id": 1,
"randomval": "xyz"
}
]
}
Lets say in above case, if i want to get all jsons data with arr_id = 1.
Expected Result -
{
"id": xyz
"items": [
{
"arr_id": 1,
"randomval": "abc"
},
{
"arr_id": 1,
"randomval": "xyz"
}
]
}
If i write a query like below, it still gives me entire json.
Select * from c where ARRAY_CONTAINS(c.items, {"arr_id": 1},true)
I want it to filter it items level too. I guess it just filters at header level and provides entire json where even a single arr_id matches.
You can use either
SELECT c.id, ARRAY(SELECT VALUE i FROM i in c.items where i.arr_id = 1) as items
FROM c
WHERE EXISTS(SELECT VALUE i FROM i in c.items where i.arr_id = 1)
or
SELECT c.id, ARRAY(SELECT VALUE i FROM i in c.items where i.arr_id = 1) as items
FROM c
depending on whether you expect an empty array if no arrayItem with arr_id=1 exists or you wnat to filter out those records compeletely.
Also see this link for a good overview of query options across arrays - https://devblogs.microsoft.com/cosmosdb/understanding-how-to-query-arrays-in-azure-cosmos-db/

Cosmos DB query on key-value pairs

I have a large collection of json documents whose structure is in the form:
{
"id": "00000000-0000-0000-0000-000000001122",
"typeId": 0,
"projectId": "p001",
"properties": [
{
"id": "a6fdd321-562c-4a40-97c7-4a34c097033d",
"name": "projectName",
"value": "contoso",
},
{
"id": "d3b5d3b6-66de-47b5-894b-cdecfc8afc40",
"name": "status",
"value": "open",
},
.....{etc}
]
}
There may be a lot of properties in the collection, all identified by the value of name. The fields in properties are pretty consistent -- there may be some variability, but they will all have the fields that I care about. There's an Id, some labels, etc
I'm wanting to combine these with some other data in PowerBI using the projectId to create some very valuable reports.
I think what I want to do it 'normalize' this data into a table, like:
ProjectId
projectName
status
openDate
closeDate
manager
p001
contoso
open
20200101
me
etc
​
Where I'm at...
I can go:
SELECT c["value"] AS ProjectName
FROM c in t.Properties
WHERE c["name"] = "projectName"
... this will give me each projectName
I can do that a heap of times to get the 'values' (status, openDate, manager, etc)
If I want to combine them together then I would need to combine all those sub-queries together with 'id'. But 'id' in not in the scope of the SELECT, so how do I get it?? If I were to do this, it sounds like something that would be very expensive (RU's) to execute.
I think I'm overcomplicating this, but I cant quite get my head around the Cosmos syntax.
Help??
You can achieve it with JOINS and the WHERE expressions although the scheme is not ideal for querying and you should consider changing it.
SELECT
c['projectId'], --c.projectId also works, but value is a reserved keyword
n['value'] AS projectName,
s['value'] AS status
FROM c
JOIN n IN c.properties
JOIN s IN c.properties
WHERE n['name'] = 'projectName' AND s['name'] = 'status'
--note all filtered properties must appear exactly once for it to work properly
Edit; new query that solves the potential issue that filtered properties must appear exactly once.
SELECT
c['projectId'],
ARRAY(
SELECT VALUE n['value']
FROM n IN c.properties
WHERE n['name'] = 'projectName'
)[0] AS projectName,
ARRAY(
SELECT VALUE n['value']
FROM n IN c.properties
WHERE n['name'] = 'status'
)[0] AS status
FROM c

Top N per Classification in CosmosDB

I'm kinda stuck on this issue. I have several hundreds of a certain model stored in ComsosDb and I can't seem to get the top 5 of each category.
This is the model:
"id": "06224840-6b88-4394-9324-4d1628383702",
"name": "Reservation",
"description": null,
"client": null,
"reference": null,
"isMonitoring": false,
"monitoringSince": null,
"hasRiskProfile": false,
"riskProfile": -1,
"monitorFrequency": 0,
"mainBindable": null,
"organizationId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"userId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"createDate": "2020-08-18T11:00:02.5266403Z",
"updateDate": "2020-08-18T11:00:02.5266419Z",
"lastMonitorDate": "2020-08-18T11:00:02.5266427Z"
So what i'm trying to do is use C# to get the top 5 from each risk profile where the organizationId matches. GroupBy through LINQ throws an error, same with a row_number() query combined with a PARTITION BY, doesn't seem to work either.
Any way I can get this to work in a single query compatible with cosmos?
EDIT:
What i am trying to achieve in CosmosDb is this roughly:
WITH TopEntries AS (
SELECT *
,ROW_NUMBER() OVER (
PARTITION BY [riskProfile]
ORDER BY [updateDate] DESC
) AS [ROW NUMBER]
WHERE [organizationId] = "xyz"
FROM [reservations]
)
SELECT * FROM TopEntries
WHERE TopEntries.[ROW NUMBER] <= 5
It sounds like combining TOP and ORDER BY would do the job. For example:
SELECT TOP 5 *
FROM c
WHERE c.organizationId = "xyz"
ORDER BY c.riskProfile
You can build such queries with parameters in the .NET SDK as in this sample.
The functionality you are trying to achieve is not directly possible through single query in Cosmos DB. There are 2 steps to do this(You can change as per you document sets)
Firstly you will have to group by like below:
SELECT c.city FROM c where c.org = 'xyz' group by c.city
Then loop through the result one by one from the first query like below:
SELECT TOP 5 * FROM C WHERE C.city = 'delhi' order by C.date desc
You can refer to similar issue here:
https://learn.microsoft.com/en-us/answers/questions/38454/index.html

Getting values from array in Cosmos Db

My document that I save in Cosmos DB looks like this:
{
"id": "abc123",
"myProperty": [
"1905844b-6ca9-4967-ba40-a736b685ca62",
"b03cc85c-ef0b-4f48-9c31-800de089190a"
]
}
As you can see, in the myProperty property, I have an array of GUID values and I want to read them as an array/list of GUID values but I'm having trouble formulating the correct SELECT statement.
The output I'm looking for is:
[
"1905844b-6ca9-4967-ba40-a736b685ca62",
"b03cc85c-ef0b-4f48-9c31-800de089190a"
]
The closest I could get is this `SELECT statement:
SELECT VALUE c.myProperty FROM c WHERE c.id = "abc123"
But this doesn't give me exactly what I want either. This gives me an array within an array i.e.
[
[
"1905844b-6ca9-4967-ba40-a736b685ca62",
"b03cc85c-ef0b-4f48-9c31-800de089190a"
]
]
What should my SELECT statement look like to get what I want?
I dont think you can ever get anything else, because cosmos db will always return an array in response to a query because potentially there can be 0-infinity results. so you will always get a top level array that will wrap all your results (even if you have only one)

CosmosDB SQL String functions not working with a join?

I have a collection in DocumentDB with objects that look like this:
{
"id":"1de03a93-729d-43da-985a-12584079b4f8",
"Components":[
{
"Name":"MyComponentName1",
"Value": 12345
},
{
"Name":"MyComponentName2",
"Value": 34567
},
{
"Name":"MyComponentName3",
"Value": 56789
}
]
...other properties irrelevant to question...
}
When querying CosmosDB, I have the following query:
SELECT VALUE d FROM c
JOIN d IN c.Components
WHERE d.Name="MyComponentName1"
which correctly returns:
{
"Name":"MyComponentName1",
"Value":12345
}
However, when I attempt to query based on a String operator:
SELECT VALUE d FROM c
JOIN d IN c.Components
WHERE CONTAINS(d.Name,'MyComponent') --OR STARTSWITH OR ENDSWITH
I get no results.
If I take the same query as above but I add an id restriction to the where clause:
SELECT VALUE d FROM c
JOIN d IN c.Components
WHERE CONTAINS(d.Name,'MyComponent')
AND c.id = "1de03a93-729d-43da-985a-12584079b4f8"
I get back the results I expect, but obviously only for that id. I need all of the documents that match the String operator.
Is this a bug with CosmosDB, or am I doing something wrong?
Nick,
Make sure that you're following all the continuations when you execute this query. Please keep in mind that the query w/ Contains will result in a full scan and hence it might not finish in a single continuation. This is the same case w/ EndsWith. For StartsWith, however, it should utilize the index, but only if the collection index policy define range index on strings; otherwise, it will still be a scan.

Resources