CosmosDB : How to apply concurrency while inserting a document (in parallel requests) - azure-cosmosdb

Background:
We have a EventHub where thousands of events are logged every day. The Azure function are configured on trigger over this eventhub on arrival of new messages. The azure function does following two tasks:
Write the raw message into document DB (collection 1)
Upsert an summary (aggregated) message into collection 2 of document Db. Before writing a message it checks if a summary message is already exists based on partition key and unique id (not id), it a doc exists then it update the doc with new aggregated value and if not then insert a new doc. This unique id is created based on a business logic.
Problem Statement:
More than one summary document is getting created for a PartitionKey and unique Id
Scenario Details
let us say, for PartitionKey PartitionKey1 there is no summary
document created in Collection for computed unique key.
multiple messages (suppose 2) arrived at eventhub and which have triggered azure functions.
all these 2 requests run concurrently, Since no existing document is found using the query, so each request make a message, now the Upsert function is
invoked almost at the same time for writing summary document by concurrent request and resulted to have multiple summary documents for a PartitionKey and unique Id.
I've searched and read about Optimistic Concurrency which definitely I will implement for update scenario. but I could not able to find any way through which insert scenarios can be handled?

According to your description, I suggest you use Stored Procedure to achieve this.
Cosmos DB to guarantee ACID for all operations that are part of a single stored procedure.
As the official said: If the collection the stored procedure is registered against is a single-partition collection, then the transaction is scoped to all the documents within the collection. If the collection is partitioned, then stored procedures are executed in the transaction scope of a single partition key. Each stored procedure execution must then include a partition key value corresponding to the scope the transaction must run under.
For more information about Stored Procedure of Cosmos DB and how to create Stored Procedure, we can refer to:
Azure Cosmos DB server-side programming: Stored procedures, database triggers, and UDFs
Create and use stored procedures using C#

Related

BizTalk send/receive - does it wait for completion of a called stored procedure?

I've setup a BizTalk design that chains a couple of send/receives to a SQL stored procedure (which inserts the data to relevant tables).
It's organised in a specific sequence, so data goes into Table A, and following tables after this check that the data exists in Table A at the Stored Procedure level (simple (IF EXISTS in Table A setup...).
I've noticed though that the flow isn't consistent further down the chain, almost as if SQL is executing the stored procedure to insert/update the record slower than the BizTalk transaction is occurring.
I've made sure that my Biz design is send/receive, as I assumed the transaction wouldn't progress until Biz received a response from the stored procedure (which would indicate SQL has finished inserting the required data).
The below example highlights where the process writes data to the Person table, but is later called upon by the Student Programme/Student Module. Occasionally, it will dehydrate on the Programme or Module stored procedure (from what I can tell, because the stored procedures are looking to see if a Person record created at the start of the flow exists)
Can anyone please confirm if;
Send/Receive will wait for a SQL stored procedure to finish executing before progressing the BizTalk transaction further through the orchestration?
BizTalk Orchestrations have some smarts built into it, if there are no dependencies in the next shapes on the response, then no, it might not wait for the response to execute the next shapes. What you can try is to enable the Delivery Notification to Transmitted on the Logical Send Port settings.

Scan entire dynamo db and update records based on a condition

We have a business requirement to deprecate certain field values("**State**"). So we need to scan the entire db and find these deprecated field values and take the last record of that partition key(as there can be multiple records for the same partition key, sort key is LastUpdatedTimeepoch), then update the record. Right now the table contains around 600k records. What's the best way to do this without bringing down the db service in production?
I see this thread could help me
https://stackoverflow.com/questions/36780856/complete-scan-of-dynamodb-with-boto3
But my main concern is -
This is a one time activity. As this will take time, we cannot run this in AWS lambda since it will exceed 15 minutes. So where can I keep the code running for this?
Create EC2 instance and assign role to access dynamo db and run function in EC2 instance.

How to delete all data in a partition?

I have a CosmosDB collection with a number of different partitions. I want to delete all of the data in one of the partitions so I tried to run the command:
db.myCollection.deleteAll({PartitionKey: 'pop-9q'})
Where PartitionKey is the field that I partition/shard based on. But when I execute this it returns the not very helpful message:
ERROR: An Error has occurred
Why would I be getting this message and how can I either get more details on the cause or find a resolution?
Currently, at this time, you are unable to perform a bulk delete. Please Up Vote and Comment on this functionality: Add the ability to delete ALL data in a partition
Additionally, which API are you consuming? For Gremlin API you could execute something like the following: g.V().drop()
The Microsoft.Azure.Cosmos SDK has added this ability - currently only available as a preview feature (which requires you to opt-in via the portal)
See here for more details:
https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/how-to-delete-by-partition-key?tabs=dotnet-example
Sample code included there:
// Get reference to the container
var container = cosmosClient.GetContainer("DatabaseName", "ContainerName");
// Delete by logical partition key
ResponseMessage deleteResponse = await container.DeleteAllItemsByPartitionKeyStreamAsync(new PartitionKey("Contoso"));
if (deleteResponse.IsSuccessStatusCode) {
Console.WriteLine($"Delete all documents with partition key operation has successfully started");
}
As #Mike said, a "delete all data" feature is not supported yet in Cosmos db SQL API and Mongo API. I notice that you have already added comments in above link. I just provide you with a workaround here that using bulk delete stored procedure for Cosmos db SQL API.
(sample code: https://gist.github.com/deepumi/2a23c5380202bddf0b85e83baf5833be)
For Mongo API, unfortunately, even stored procedure is not supported. You could create an Azure HTTP Trigger Function to execute bulk delete code in the function whenever you want or merge it into your program code.

Can we insert a data into Firebase realtime Database?

One child node of my Firebase realtime database has become huge (aroung 20 GB) and I need to purge this and insert the the extracted data of last month from the backup into the Firebase realtime database using Python Admin SDK.
In the documentation, I see the following options:
set - Write or replace data to a defined path, like messages/users/
update - Update some of the keys for a defined path without replacing all of the data
push - Add to a list of data in the database. Every time you push a new node onto a list, your database generates a unique key, like messages/users//
transaction - Use transactions when working with complex data that could be corrupted by concurrent updates
However, I want to add/insert the data from the firebase backup. I have to insert because the app is used in production and I cannot afford the overwrite of data.
Is there any method available to insert/add the data and not overwrite the data?
Any help/support is greatly appreciated.
There is no way to do this in Firebase Realtime Database without reading the current value of each location.
The only operation that allows you to update data based on its existing value is a transaction. A Firebase transaction gives you the (likely) current value at a location, and you then return what the new value should become.
But if the data you're restoring is (largely) the same as the data you have in the database, you might be able to use an update() call with sufficiently deep paths.

best practice for bulk update in document DB

we have a scenario where we need to populate the collection every one hour with the latest data whenever we receive the data file in blob from external sources and at the same time , we do not want to impact the live users while updating the collection.
So, we have done below
Created 2 databases and collection 1 in both databases
Created a another collection in different database( configuration database ) with property as Active and Passive and this will have the Database1 and Database2 as values for the above properties
Now , our web job will run every time it sees the file in blob and check this configuration database and identify which one is active or passive and process the xml file and update the collection in passive database as that is not used by the live feed and once it is done , will update the active database to current and passive to live
now , our service will always check which one is active and passive and fetch the data accordingly and show to user
As we have to delete the data and insert the newly data in web job , wanted to know is this is best design we have come up with ? Does deleting and inserting the data will cost ? Is there better way to do bulks delete and insert as we are doing sequentially now
wanted to know is this is best design we have come up with ?
As David Makogon said, as for your solution, you need to manage and pay for multiple databases. If possible, you could create new documents in same collection and control which document is active in your program logic.
Does deleting and inserting the data will cost ?
the operation/request will consume the request units, which will be charged. To know Request Units and DocumentDB Pricing details, please refer to:
What is a Request Unit
DocumentDB pricing details
Is there better way to do bulks delete and insert as we are doing sequentially now
Stored Procedure that provides a way to group operations like inserts and submit them in bulk. You could create the stored procedures and then execute the stored procedure in your Webjobs function.

Resources