Does creating a database/collection cost something in Cosmos DB? - azure-cosmosdb

Does it cost anything every time you drop/recreate a database in Cosmos DB (SQL API)?
Does it cost anything every time you drop/recreate a collection within a database in Cosmos DB (SQL API)?

Database, Collections, Offers, Documents etc all inherit from the same single object which is a Resource. Resources are basically a CosmosDB object which is represented in JSON.
Creating a collection or creating a database is essentially creating a Resource which is a document whose size doesn't exceed the 1kb so you will be charged the minimum price for the Create or Read of that data.
Keep in mind however that CosmosDB also charges you hourly per collection based on it's provisioned RUs. For example, if you create a collection even for a second, you will be charged the hourly rate of this collection's existence based on the provisioned RUs.

Based on the official doc, databases, users, permissions, collections ,documents and attachments are all resources. Just as what is mentioned in this doc, the billing unit for Cosmos DB is RUs. If you operate any resources in cosmos db , you will consume RUs so that costs will be incurred.
I tested code related to creating and dropping database via java cosmos db sdk.You could see the consumption of RUs with your operations.
public static void main(String[] args) throws Exception, DocumentClientException {
DocumentClient documentClient = new DocumentClient(END_POINT,
MASTER_KEY, ConnectionPolicy.GetDefault(),
ConsistencyLevel.Session);
Database database = new Database();
database.setId("hello");
ResourceResponse<Database> response = documentClient.createDatabase(database, null);
System.out.println(response.getRequestCharge());
ResourceResponse<Database> response1 = documentClient.deleteDatabase("dbs/hello", null);
System.out.println(response1.getRequestCharge());
}
Another details about price in cosmos db,please see this doc.
Hope it helps you.

Creating a Collection will incur one hour of billing at whatever RU/s throughput you have provisioned. Same as with scaling. IE if you scale up from 1000RU/s to 2000RU/s and then back down immediately you'll still be charged for one hour usage at 2000RU/s.
The Azure Pricing Calculator will let you break down the cost of a collection in hourly granularity based on the provisioned throughput.

Cosmos DB pricing completely depends on reserve RU per collection, You don't need to pay anything for creating Cosmos DB account and Database but when you will create a collection, minimum RU limit is 400 per collection, so if you use or not that collection you need to pay.

Related

Reading the Realtime database directly VS using Cloud Functions

I have been reading this article about reading the realtime database directly vs calling cloud functions that returns database data.
If I am returning a fairly large chunk of data e.g. a json object that holds data representing 50 user comments from a cloud function does this count
As Outbound Data (Egress) data? If so does this cost $0.12 per gb per month?
The comments are stored like so with an incremental key.
comments: [0 -> {text: “Adsadsads”},
1 -> {text: “dadsacxdg”},
etc.]
Furthermore, I have read you can call goOffline() and goOnline() using the client sdks to stop concurrent connections. Are there any costs associated with closing and
Opening database connections or is it just the speed aspect of opening a connection every time you read?
Would it be more cost effective to call a cloud function that returns the set of 50 comments or allow the devices to read the comments directly from the database
But open/close before/after each read to the database, using orderByKey(), once(), startAt() and limitToFirst()?
e.g something like this
ref(‘comments’).once().orderByKey().startAt(0).limitToFirst(50).
Thanks
If your Cloud Function reads data from Realtime Database and returns (part of) that data to the caller, you pay for the data that is read from the database (at $1/GB) and then also for the data that your Cloud Function returns to the user (at $0.12/GB).
Opening a connection to the database means data is sent from the database to the client, and you are charged for this data (typically a few KB).
Which one is more cost effective is something you can calculate once you have all parameters. I'd recommend against premature cost optimization though: Firebase has a pretty generous free tier on its Realtime Database, so I'd start reading directly from the database and seeing how much traffic that generates. Also: if you are explicitly managing the connection state, and seem uninterested in the realtime nature of Firebase, there might be better/cheaper alternatives than Firebase to fill your needs.

Scan entire dynamo db and update records based on a condition

We have a business requirement to deprecate certain field values("**State**"). So we need to scan the entire db and find these deprecated field values and take the last record of that partition key(as there can be multiple records for the same partition key, sort key is LastUpdatedTimeepoch), then update the record. Right now the table contains around 600k records. What's the best way to do this without bringing down the db service in production?
I see this thread could help me
https://stackoverflow.com/questions/36780856/complete-scan-of-dynamodb-with-boto3
But my main concern is -
This is a one time activity. As this will take time, we cannot run this in AWS lambda since it will exceed 15 minutes. So where can I keep the code running for this?
Create EC2 instance and assign role to access dynamo db and run function in EC2 instance.

Request unit limit in Cosmos DB Sql API

I'm using ADF pipeline for storing data from data lake into cosmos db sql api. On triggering pipeline run, I see the following error:
Currently I'm using Throughput 5000 RU/s for the cosmos db container. Please help me understand why it is showing this error?
Here is my pipeline:
For saving your cost, don't unlimited increase RUs setting of cosmos db. Based on the error message you provided in your answer,it indicates that cosmos db can't process so many rows in unit time. So i suggest you throttling the transfer data procedure. Please consider using below configuration for the sink dataset of your copy activity.
The default value is 10,000, please decrease the number here if you do not concern the speed of transmission.More details ,please refer to this document.
Additional, you could throttling the max concurrent connections:

Re-run all changes in Lease Collection

I created several new pipelines in Azure Data Factory to process CosmosDB Change Feed (which go into Blob storage for ADF processing to on-prem SQL Server), and I'd like to "resnap" the data from the leases collection to force a full re-sync. Is there a way to do this?
For clarity, my set-up is:
Change Feed ->
Azure Function to process the changes -> Blob Storage to hold the JSON documents -> Azure Data Factory which picks up the Blob Storage documents and maps them to on-prem SQL Server stored proc inserts/updates.
The easiest and simplest way is to do it is to simply delete the lease documents and make sure that the StartFromBeginning setting is set to true. Once restarted the change feed service will recreate the leases (if the appropriate setting is configured to true) and reprocess all the documents.
The other way to do so is to update every single lease document and reset the Continuation token "checkpoint" to null, however I don't recommend this method since you might accidentally miss a lease which can lead to issues.

CosmosDB : How to apply concurrency while inserting a document (in parallel requests)

Background:
We have a EventHub where thousands of events are logged every day. The Azure function are configured on trigger over this eventhub on arrival of new messages. The azure function does following two tasks:
Write the raw message into document DB (collection 1)
Upsert an summary (aggregated) message into collection 2 of document Db. Before writing a message it checks if a summary message is already exists based on partition key and unique id (not id), it a doc exists then it update the doc with new aggregated value and if not then insert a new doc. This unique id is created based on a business logic.
Problem Statement:
More than one summary document is getting created for a PartitionKey and unique Id
Scenario Details
let us say, for PartitionKey PartitionKey1 there is no summary
document created in Collection for computed unique key.
multiple messages (suppose 2) arrived at eventhub and which have triggered azure functions.
all these 2 requests run concurrently, Since no existing document is found using the query, so each request make a message, now the Upsert function is
invoked almost at the same time for writing summary document by concurrent request and resulted to have multiple summary documents for a PartitionKey and unique Id.
I've searched and read about Optimistic Concurrency which definitely I will implement for update scenario. but I could not able to find any way through which insert scenarios can be handled?
According to your description, I suggest you use Stored Procedure to achieve this.
Cosmos DB to guarantee ACID for all operations that are part of a single stored procedure.
As the official said: If the collection the stored procedure is registered against is a single-partition collection, then the transaction is scoped to all the documents within the collection. If the collection is partitioned, then stored procedures are executed in the transaction scope of a single partition key. Each stored procedure execution must then include a partition key value corresponding to the scope the transaction must run under.
For more information about Stored Procedure of Cosmos DB and how to create Stored Procedure, we can refer to:
Azure Cosmos DB server-side programming: Stored procedures, database triggers, and UDFs
Create and use stored procedures using C#

Resources