Cosmos db get one record of each partition key - azure-cosmosdb

I have a container keeping some documents like:
{
"id": "deece304-XXXXXXX-88e8-fcfc0c750e97",
"log_VehicleId": 123,
"latitude": -000.000,
"longitude": 111.111,
"_ts": 1593825193
}
my partition key is "log_VehicleId" i need a Cosmos SQl that gives me newset record of each partition
something like
Select top 1 from container c where c.log_VehicleId IN (123,234,312,123,873)
order by c._ts DESC
would be fantastic to have LINQ equivalent of that too
so I will have newest record per "log_VehicleId"
thanks

According to the documentation:
You currently cannot use GROUP BY with an ORDER BY clause but this is planned
So this is not supported now.You can vote and track this feature at here.

Related

Query dynamodb db list items with IN clause

I have a dynamodb table whose items have below structures.
{
"url": "some-url1",
"dependencies": [
"dependency-1",
"dependency-2",
"dependency-3",
"dependency-4"
],
"status": "active"
}
{
"url": "some-url2",
"dependencies": [
"dependency-2",
],
"status": "inactive"
}
{
"url": "some-url3",
"dependencies": [
"dependency-1",
],
"status": "active"
}
Here, url is defined as the partition key and there is no sort key.
The query which needs to run needs to find all the records with a specific dependency and status.
For example - find all the records for whom dependency-1 is present in dependencies list and whose status is active.
So for the above records, record 1st and 3rd should be returned.
Do I need to set GSI on dependencies or is this something which cannot be done in dynamodb ?
You cannot create a GSI on a nested value. You can however create a GSI on status but you would need to be careful as it has a low cardinality meaning you could limit your throughput to 1000 writes per second if all of your items being written to the table have the same status. Of course if you never intend to scale that high then it's no issue.
Your other option is to use a Scan where you read your entire data set and use a FilterExpression to filter based on dependency and status.
Depending on the SDK you use you can find some example operations here:
https://github.com/aws-samples/aws-dynamodb-examples/tree/master/DynamoDB-SDK-Examples

Azure Cosmos Db document partition key having duplicate, but find duplicate document with combination of other columns

I have below document JSON (pasted partial JSON, actual JSON will be complex and embedded). The JSON has Code as ParitionKey, I am trying to build No SQL database documents by migrating my sql tables, and I will have Code, Type making Unique row, as you can see below Code = 4 is duplicated with different Type and id I just generated GUID (not sure on id field so generated GUID and assigned to it).
we only have two values for Type filed, it's either RI or NRI for entire data, and Code is duplicated like below sample data Code:4, but combination of Type & Code fields make it unique.
Example JSON:
{
"id" : "88725628-2a9a-4fc7-90ed-29c5ffbd45fa"
"Code": "4",
"Type": "RI",
"Description": "MAC/CHEESE ",
},
{
"id" : "88725628-9a3b-4fc7-90ed-29c5ffbd34sk"
"Code": "8",
"Type": "RI",
"Description": "Cereals",
},
{
"id" : "88725628-6d9f-4fc7-90ed-29c4ffbd87de"
"Code": "4",
"Type": "NRI",
"Description": "Christmas Deal",
}
In NoSQL cosmos document db, I couldn't use two columns as partition key, so I have only code as Partition key, but when I am trying to insert into Cosmos Db how do I check if not exists then only insert or else I would end up creating duplicate documents:
CreateItemAsync --> I need a way to check if the document already exists if not then create
I have below code to check and if not found create Item
try
{
// Read the item to see if it exists.
ItemResponse<Item> itemResponse = await this.container.ReadItemAsync<Item>(itm.Id, new PartitionKey(itm.Code));
}
catch (CosmosException ex) when (ex.StatusCode == HttpStatusCode.NotFound)
{
// Create an item in the container representing the Andersen family. Note we provide the value of the partition key for this item, which is "Andersen"
ItemResponse<Item> itemResponse = await this.container.CreateItemAsync<Item>(itm, new PartitionKey(itm.Code));
}
But from above code in ReadItemAsync parameters, how do I know id parameter as it is a GUID randomly generated on every insert, is there a better way to utilize id property before insert into Cosmos DB, so it can be utilized while ReadItemAsync ?
second parameter is paritionKey, If I give code as partition key, it wouldn't work as expected as Code can be duplicated with different "Type" values and it's valid, but Code & Type together makes it unique and we shouldn't allow another document to be inserted if code and type are same.
How do I do it in Cosmos db insert ? I have below questions:
id field --> can I generate GUID and save document or id filed has any purpose which can be utilized during reads ?
Is it ok to pick a partition key which can potentially have duplicates like Code field.
How do I check document exists before insert with above qualifiers as Code filed can be duplicated but only With Type it makes it unique ?
Any suggestions ?
If code and type make a unique row then you should use the value of type for id as well rather than generating a GUID because in Cosmos DB the combination of your partition key and id must be unique.
Then when you do an insert, if the data is already there it will throw an exception which you can catch. For reads, if you know the value for code and type, you can use these to perform a point read to get a single row of data, rather than using a query. This is the most efficient way to fetch data in Cosmos DB.
It is fine to have duplicates for partition key values. You only need to make sure that you have less than 20GB of data for each partition key value.

Cosmos DB queries - using ORDER BY when a property does not exist in all documents

We are experiencing an issue in when writing queries for Cosmos Document DB and we want to create a new document property and use it in an ORDER BY clause
If, for example, we had a set of documents like:
{
"Name": "Geoff",
"Company": "Acme"
},
{
"Name": "Bob",
"Company": "Bob Inc"
}
...and we write a query like SELECT * FROM c ORDER BY c.Name this works fine and returns both documents
However, if we were to add a new document with an additional property:
{
"Name": "Geoff",
"Company": "Acme"
},
{
"Name": "Bob",
"Company": "Bob Inc"
},
{
"Name": "Sarah",
"Company": "My Company Ltd",
"Title": "President"
}
...and we write a query like SELECT * FROM c ORDER BY c.Title it will only return the document for Sarah and excludes the 2 without a Title property.
This means that the ORDER BY clause is behaving like a filter rather than just a sort, which seems unexpected.
It seems that all document schemas are likely to add properties over time. Unless we go back and add these properties to all existing document records in the container then we can never use them in an ORDER BY clause without excluding records.
Does anyone have a solution to allow the ORDER BY to only effect the Sort order of the result set?
Currently, ORDER BY works off of indexed properties, and missing values are not included in the result of a query using ORDER BY.
As a workaround, you could do two queries and combine the results:
The current query you're doing, with ORDER BY, returning all documents containing the Title property, ordered
A second query, returning all documents that don't have Title defined.
The second query would look something like:
SELECT * FROM c
WHERE NOT IS_DEFINED(c.Title)
Also note that, according to this note within the EF Core repo issue list, behavior is a bit different when using compound indexes (where documents with missing properties are returned).

DocumentDB adding ORDER BY clause uses excessive RUs

I have a partitioned collection with about 400k documents in a particular partition. Ideally this would be more distributed, but I need to deal with all the documents in the same partition for transaction considerations. I have a query which includes the partition key and the document id, which returns quickly with 2.58 RUs of usage.
This query is dynamic and potentially could be constructed to have an IN clause to search for multiple document ids. As such I added an ORDER BY to ensure the results were in a consistent order, adding the clause however caused the RUs to skyrocket to almost 6000! Given that the WHERE clause should be filtering down the results to a handful before sorting, I was surprised by these results. It almost seems like it's applying the ORDER BY before the WHERE clause, which must not be correct. Is there something under the covers with the ORDER BY clause that would explain this behavior?
Example document:
{ "DocumentType": "InventoryRecord", (PartitionKey, String) "id": "7867f600-c011-85c0-80f2-c44d1cf09f36", (DocDB assigned GUID, stored as string) "ItemNumber": "123345", (String) "ItemName": "Item1" (String) }
With a Query looking like this:
SELECT * FROM c where c.DocumentType = 'InventoryRecord' and c.id = '7867f600-c011-85c0-80f2-c44d1cf09f36' order by c.ItemNumber
You should at least put a range index to ItemNumber. This should ensure, there is a ordering as expected. The addition in your indexing policy this would look like
{
"path": "/ItemNumber/?",
"indexes": [
{
"kind": "Range",
"dataType": "String",
"precision": -1
}
]
}

DynamoDB Descending Order fetch records

i have 100 records in collection,
collection name:'users'
{
"name":'senthilkumar',
"email":'senthily88#gmail.com', //HashKey
"age":21,
"created":1465733486137, //RangeKey-timestamp
}
i need to fetch records the following sql query wise
select * from users order by created desc limit 10
How i can get above query format records from DynamoDB
Dynamodb sorts the results by the range key attribute. You can set the ScanIndexForward boolean parameter to true for ascending or false for descending.
resource: http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html
Use the KeyConditionExpression parameter to provide a specific value
for the partition key. The Query operation will return all of the
items from the table or index with that partition key value. You can
optionally narrow the scope of the Query operation by specifying a
sort key value and a comparison operator in KeyConditionExpression.
You can use the ScanIndexForward parameter to get results in forward
or reverse order, by sort key.
To Save Json Data to DynamoDB us put()
var Newparams = {
TableName: this.SuffleTableName,
Item: {
"userId": /* YOUR PRIMARY KEY */,
"addedAt": /* YOUR SORT KEY */,
"status": /* Additional Datas */,
}
}
Fetch Data From DynamoDB using Query()
QueryParam = {
TableName: 'YOUR TABLE NAME HERE',
IndexName: 'YOUR INDEX NAME HERE', //IF YOUR CREATED NEW INDEX
KeyConditionExpression: "UserId = :UserId ", //YOUR PRIMARY KEY
ExpressionAttributeValues: {
":UserId": UserId,
},
ScanIndexForward: false, //DESC ORDER, Set 'true' if u want asc order
ExclusiveStartKey: LastEvalVal, //Pagination - LastEvaluatedKeyPair
Limit: 10 //DataPerReq
}
If you want to return all rows in your table, you cannot use the query API, because that API requires you to provide a partition key value to filter your results by (i.e. assuming that your partition key is name you would only be able to use the query API to bring back the subset of results that have name = a given value, i.e. name= senthilkumar
If you want to return all rows in your table, you must use the Scan API: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SQLtoNoSQL.ReadData.Scan.html
Note that all results will be provided in ascending order by the value of the Range Key. You cannot reverse sort the contents with the Scan API. You would need to reverse your resultset in the application tier using whatever language you're writing your code in to turn the results upside down.
Scan does not scale well and it is not possible to use Scan to create a paginated, reverse sorted solution if your table contains items with unique partition keys.
If this is your situation, and if you want to return paginated + reverse sorted sets back from DynamoDB, you will need to re-consider the design of your table and which columns are the partition key/range key/index so that you can use the Query API.

Resources