We've implemented a service with multiple Azure Functions in it. An HttpTrigger function will insert records into Cosmos DB. Then the Cosmos DB Trigger function will process the records when it's fired. Some times the jobs for the DB trigger function could take quite some time. If new records inserted into the Cosmos DB while the DB Trigger function is running with previous job, the new records would not cause the event to be triggered.
How we can prevent this from happening?
Thanks!
Found the cause of the problem: we were not looping through the change feed. Instead we were only processing the first item from the input of the trigger.
Related
We have a business requirement to deprecate certain field values("**State**"). So we need to scan the entire db and find these deprecated field values and take the last record of that partition key(as there can be multiple records for the same partition key, sort key is LastUpdatedTimeepoch), then update the record. Right now the table contains around 600k records. What's the best way to do this without bringing down the db service in production?
I see this thread could help me
https://stackoverflow.com/questions/36780856/complete-scan-of-dynamodb-with-boto3
But my main concern is -
This is a one time activity. As this will take time, we cannot run this in AWS lambda since it will exceed 15 minutes. So where can I keep the code running for this?
Create EC2 instance and assign role to access dynamo db and run function in EC2 instance.
According to https://www.sqlite.org/pragma.html#pragma_count_changes this pragma should not anymore be used in new projects.
How can I reliably return from within the SQLite insteadOf update trigger to the app (which executed the update command which triggered this trigger), how many records were affected by it?
I have a CosmosDB collection with a number of different partitions. I want to delete all of the data in one of the partitions so I tried to run the command:
db.myCollection.deleteAll({PartitionKey: 'pop-9q'})
Where PartitionKey is the field that I partition/shard based on. But when I execute this it returns the not very helpful message:
ERROR: An Error has occurred
Why would I be getting this message and how can I either get more details on the cause or find a resolution?
Currently, at this time, you are unable to perform a bulk delete. Please Up Vote and Comment on this functionality: Add the ability to delete ALL data in a partition
Additionally, which API are you consuming? For Gremlin API you could execute something like the following: g.V().drop()
The Microsoft.Azure.Cosmos SDK has added this ability - currently only available as a preview feature (which requires you to opt-in via the portal)
See here for more details:
https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/how-to-delete-by-partition-key?tabs=dotnet-example
Sample code included there:
// Get reference to the container
var container = cosmosClient.GetContainer("DatabaseName", "ContainerName");
// Delete by logical partition key
ResponseMessage deleteResponse = await container.DeleteAllItemsByPartitionKeyStreamAsync(new PartitionKey("Contoso"));
if (deleteResponse.IsSuccessStatusCode) {
Console.WriteLine($"Delete all documents with partition key operation has successfully started");
}
As #Mike said, a "delete all data" feature is not supported yet in Cosmos db SQL API and Mongo API. I notice that you have already added comments in above link. I just provide you with a workaround here that using bulk delete stored procedure for Cosmos db SQL API.
(sample code: https://gist.github.com/deepumi/2a23c5380202bddf0b85e83baf5833be)
For Mongo API, unfortunately, even stored procedure is not supported. You could create an Azure HTTP Trigger Function to execute bulk delete code in the function whenever you want or merge it into your program code.
Background:
We have a EventHub where thousands of events are logged every day. The Azure function are configured on trigger over this eventhub on arrival of new messages. The azure function does following two tasks:
Write the raw message into document DB (collection 1)
Upsert an summary (aggregated) message into collection 2 of document Db. Before writing a message it checks if a summary message is already exists based on partition key and unique id (not id), it a doc exists then it update the doc with new aggregated value and if not then insert a new doc. This unique id is created based on a business logic.
Problem Statement:
More than one summary document is getting created for a PartitionKey and unique Id
Scenario Details
let us say, for PartitionKey PartitionKey1 there is no summary
document created in Collection for computed unique key.
multiple messages (suppose 2) arrived at eventhub and which have triggered azure functions.
all these 2 requests run concurrently, Since no existing document is found using the query, so each request make a message, now the Upsert function is
invoked almost at the same time for writing summary document by concurrent request and resulted to have multiple summary documents for a PartitionKey and unique Id.
I've searched and read about Optimistic Concurrency which definitely I will implement for update scenario. but I could not able to find any way through which insert scenarios can be handled?
According to your description, I suggest you use Stored Procedure to achieve this.
Cosmos DB to guarantee ACID for all operations that are part of a single stored procedure.
As the official said: If the collection the stored procedure is registered against is a single-partition collection, then the transaction is scoped to all the documents within the collection. If the collection is partitioned, then stored procedures are executed in the transaction scope of a single partition key. Each stored procedure execution must then include a partition key value corresponding to the scope the transaction must run under.
For more information about Stored Procedure of Cosmos DB and how to create Stored Procedure, we can refer to:
Azure Cosmos DB server-side programming: Stored procedures, database triggers, and UDFs
Create and use stored procedures using C#
we have a scenario where we need to populate the collection every one hour with the latest data whenever we receive the data file in blob from external sources and at the same time , we do not want to impact the live users while updating the collection.
So, we have done below
Created 2 databases and collection 1 in both databases
Created a another collection in different database( configuration database ) with property as Active and Passive and this will have the Database1 and Database2 as values for the above properties
Now , our web job will run every time it sees the file in blob and check this configuration database and identify which one is active or passive and process the xml file and update the collection in passive database as that is not used by the live feed and once it is done , will update the active database to current and passive to live
now , our service will always check which one is active and passive and fetch the data accordingly and show to user
As we have to delete the data and insert the newly data in web job , wanted to know is this is best design we have come up with ? Does deleting and inserting the data will cost ? Is there better way to do bulks delete and insert as we are doing sequentially now
wanted to know is this is best design we have come up with ?
As David Makogon said, as for your solution, you need to manage and pay for multiple databases. If possible, you could create new documents in same collection and control which document is active in your program logic.
Does deleting and inserting the data will cost ?
the operation/request will consume the request units, which will be charged. To know Request Units and DocumentDB Pricing details, please refer to:
What is a Request Unit
DocumentDB pricing details
Is there better way to do bulks delete and insert as we are doing sequentially now
Stored Procedure that provides a way to group operations like inserts and submit them in bulk. You could create the stored procedures and then execute the stored procedure in your Webjobs function.