Cosmos DB - Query for newest document of select partitions? - azure-cosmosdb

Consider a CosmosDB container with the following document model:
{
id: <string>,
userId: <string>, // partition key
data: <string>
}
I have a need to provide a query with N user ids and get the newest document for each one.
So for example, if I have this data in the container:
{ id: '1', userId: 'user1', data: 'a', _ts: 1 },
{ id: '2', userId: 'user1', data: 'b', _ts: 2 },
{ id: '3', userId: 'user2', data: 'c', _ts: 10 },
{ id: '4', userId: 'user2', data: 'd', _ts: 5 },
{ id: '5', userId: 'user3', data: 'e', _ts: 3 },
{ id: '6', userId: 'user3', data: 'f', _ts: 4 },
{ id: '7', userId: 'user4', data: 'g', _ts: 100 },
{ id: '8', userId: 'user4', data: 'h', _ts: 99 },
{ id: '9', userId: 'user5', data: 'i', _ts: 1 },
{ id: '10', userId: 'user5', data: 'j', _ts: 2 },
I want to do something like this:
-- This doesn't work
SELECT c.userId, (SELECT TOP 1 d.id, d.data WHERE d.userId = c.userId FROM d ORDER BY d._ts DESC) AS newest
WHERE c.userId IN ['user1', 'user2', 'user4', 'user5']
To get this result:
{ userId: 'user1', newest: { id: '2', data: 'b' } },
{ userId: 'user2', newest: { id: '3', data: 'c' } },
{ userId: 'user4', newest: { id: '7', data: 'g' } },
{ userId: 'user5', newest: { id: '10', data: 'j' } },
From what I can tell, JOIN in CosmosDB cannot be used to filter correlated documents. Is there still a way to accomplish this? I am open to using a stored procedure, but from what I can tell execution of a stored procedure can only occur on a specific partition given it's key. In my case, the primary grouping is the partition key.
I have considered a fan-out request approach, but I might be querying for 50 to 100 user ids at a time in the query. In that case it might be faster to just get all documents in each partition and when iterating only keep the newest -- but that's a large paged response to sift through.
My final thought is I could use ASB/EventGrid/Function and another dependent CosmosDB container to always clone the most recent updated document every time a document is updated, but it seems like overkill. Surely there is a way to construct a query to do what I want?
Thanks

I have an idea like
select c._ts from c where ARRAY_CONTAINS((select value max(c._ts) from c group by c.userId), c._ts)
But it can't get the result because select value max(c._ts) from c group by c.userId doesn't be recognized as an array, and if I use Array(select value max(c._ts) from c group by c.userId) instead, it returns all items.
So how about execute sql twice?
Get the timestamp array first :select value max(c._ts) from c where c.userId in ('user1','user2') group by c.userId
,
and then copy the result as the input to use array_contains function:
select c._ts,c.data from c where ARRAY_CONTAINS([1623306298,1623306259,1623306217], c._ts)

One way of doing this would be to use the following approach.
SELECT t.userid,
SUBSTRING(t.concat, 28,8000) AS data
FROM
(
SELECT c.userid,
MAX(CONCAT(TimestampToDateTime(c._ts*1000),c.data)) AS concat
FROM c
WHERE c.userid IN ('user1', 'user2')
GROUP BY c.userid
) AS t
which returns a result like
[
{
"userid": "user1",
"data": "b"
},
{
"userid": "user2",
"data": "d"
}
]
The derived table t returns results like the following...
[
{
"userid": "user2",
"concat": "2021-06-11T17:42:03.0000000Zd"
},
{
"userid": "user1",
"concat": "2021-06-11T17:41:41.0000000Zb"
}
]
The document with the highest _ts per user will have the lexicographically highest datetime prefix in the concatenated string and the ancillary data that is appended on behind it is extracted with SUBSTRING.
It should be able to use the index for the WHERE clause - but then will need to look at all documents for the given userids (so if there are many documents per user doing separate TOP 1 queries for each one would likely be much better)

Related

Update expression for a list of maps in DynamoDB

I have the following DynamoDB table:
{
record_id: Decimal(10),
...,
options: [ # This is a List of maps
{option_id: 1, counter: Decimal(0), ...},
{option_id: 2, counter: Decimal(0), ...},
],
...
}
Which consists of some items, with unique record_id and the target options list. That list contains maps. In those maps, there is an option_id attribute, and I would like to access the item in the options list whose option_id equals to some target my_option_id and increment its counter.
For example, for the above example, given my_record_id=10 and my_option_id=2, I would like to update the second option item, with option_id=2, and increment its counter by 1, so this {option_id: 2, counter: Decimal(0), ...} becomes {option_id: 2, counter: Decimal(1), ...}.
I am using Python and boto3, but I imagine the syntax here is specific to DynamoDB. Here is what I have so far:
response = table.update_item(
Key={
'record_id': my_record_id,
},
UpdateExpression='SET options.#s.counter = options.#s.counter + :val',
ExpressionAttributeNames={
"#s": my_option_id
},
ExpressionAttributeValues={
':val': Decimal(1)
},
ReturnValues="UPDATED_NEW"
)
It seems the easy fix was to make options a map instead of a list, and then make the option_id the key of that map. Then my code works as expected:
response = table.update_item(
Key={
'record_id': my_record_id,
},
UpdateExpression='SET options.#s.counter = options.#s.counter + :val',
ExpressionAttributeNames={
"#s": my_option_id
},
ExpressionAttributeValues={
':val': Decimal(1)
},
ReturnValues="UPDATED_NEW"
)

KQL - Convert Dynamic Array of Key Value to Key Value dictionary

I have a cell of a table-column that is a dynamic. This was ingested from .Net as a Dictionary, but in Kusto it looks like an array of objects, that has a property key and value:
[
{"key":"ProjectId","value":"1234"},
{"key":"ProjectName","value":"Albatros"},
{"key":"User","value":"Bond"}
]
I want to convert the contents of the cell in my Kusto query to the following dynamic:
{
"ProjectId": "1234",
"ProjectName": "Albatros",
"User": "Bond"
}
I cant figure out how to write the expression, that converts it form the array into the new dynamic format.
Can anyone point me in the right direction?
you can use a combination of mv-apply and make_bag():
print d = dynamic([
{"key": "value"},
{"ProjectId": "1234"},
{"ProjectName": "Albatros"},
{"User": "Bond"}
])
| mv-apply d on (
summarize result = make_bag(d)
)
result
{ "key": "value", "ProjectId": "1234", "ProjectName": "Albatros", "User": "Bond"}
UPDATE based on your change to the original question:
print d = dynamic([
{"key":"ProjectId","value":"1234"},
{"key":"ProjectName","value":"Albatros"},
{"key":"User","value":"Bond"}
])
| mv-apply d on (
summarize result = make_bag(pack(tostring(d.key), d.value))
)
result
{ "ProjectId": "1234", "ProjectName": "Albatros", "User": "Bond"}

How to return all the id's with common value for another field in cosmos sql api?

the cosmos doc is in this structure
{
orderNumber: 1,
productNumber:p1
},
{
orderNumber: 1,
productNumber:p2
},
{
orderNumber: 2,
productNumber:p3
}
how do I return the list of productsnumber within the same ordernumber.
for example, the result should be like
{
orderNumber:1,
products:{
p1,
p2
}
select c.orderNumber,count(c.orderNumber)
from c
group by c.orderNumber
I tried this query to get the count on ordernumber, which gives the product count, but how can we return the actual productNumber.
Thank you

Filtering on a aggregate function

I am using Azure Cosmos DB and trying to write a query to filter document by Name and version. I am new to Cosmos and it seems the way I'm doing applies the filter per record versus the results themselves. Can anyone tell me the proper way to accomplish this:
select C.*
from c
JOIN (select MAX(c.version) from c where c.name = "test") maxVersion
where maxVersion = c.version
Sample data:
[{"name":"test","verson":1}{"name":"test","verson":2}{"name":"test","verson":3}]
Results:
I get a record back for each version vs the max version. IE I only should get one record back and it's version number should be 3
When you run this SQL:
select c,maxVersion
from c
JOIN (select MAX(c.version) from c where c.name = "test") maxVersion
you will get this document:
{
"c": {
"id": "1",
"name": "test",
"version": 1
},
"maxVersion": {
"$1": 1
}
}
{
"c": {
"id": "2",
"name": "test",
"version": 2
},
"maxVersion": {
"$1": 2
}
},
{
"c": {
"id": "3",
"name": "test",
"version": 3
},
"maxVersion": {
"$1": 3
}
}
Your maxVerson equals to c.version in each document, so you will get multiple documents not one.
According to your requirement, you can try something like this SQL:
SELECT TOP 1 *
FROM c
WHERE c.name = "test"
ORDER BY c.version DESC

how to get the specific format in crossfilter.js i.e distinct of distinct count

this is my format of data:
[{city:"Bhopal",id: 1},{city:"Bhopal",id: 2},{city:"Delhi",id: 3},{city:"Delhi",id:3}]
here i have Delhi repeated twice with same id.
now i need distinct count of city where id is distinct i.e like :
[key:"Bhopal",value:2, key:"Delhi",value:1]
where value is count
got the answer using Reductio and Crossfilter.
var payments = crossfilter([
{city: "Bhopal", id: 1},{city: "Bhopal", id: 2},{city: "Delhi", id: 3},{city: "Delhi", id: 3}
]);
var dim = payments.dimension(function(d) { return d.city; });
var group = dim.group();
var reducer = reductio()
.exception(function(d) { return d.id; })
.exceptionCount(true);
reducer(group);
console.log(group.top(Infinity));
output: [ { key: 'Bhopal', value: { exceptionCount: 2 },{ key: 'Delhi', value: { exceptionCount: 2 }]

Resources