I'm using ADF pipeline for storing data from data lake into cosmos db sql api. On triggering pipeline run, I see the following error:
Currently I'm using Throughput 5000 RU/s for the cosmos db container. Please help me understand why it is showing this error?
Here is my pipeline:
For saving your cost, don't unlimited increase RUs setting of cosmos db. Based on the error message you provided in your answer,it indicates that cosmos db can't process so many rows in unit time. So i suggest you throttling the transfer data procedure. Please consider using below configuration for the sink dataset of your copy activity.
The default value is 10,000, please decrease the number here if you do not concern the speed of transmission.More details ,please refer to this document.
Additional, you could throttling the max concurrent connections:
Related
After Selecting all the details correct in Migration tool returns the error related throughput value and having 0 or -1 does not help.
Workaround to migrate data using tool is to just create the collection first in Azure portal Cosmos DB and then run the Migration tool with same details then it will add all the rows to the same collection created. Main issue here was creation of new collection but i do not know why it returns something related to throughput which i think does not have anything to do with it.
In general, setting an explicitly value for offer_throughput is not allowed for serverless type of accounts. So either you omit that value and it will be applied by default one or you change your account type.
Related issues (still opened as of 23/02/2022):
https://learn.microsoft.com/en-us/answers/questions/94814/cosmos-quick-start-gt-create-items-contaner-gt-htt.html
https://github.com/Azure/azure-cosmos-dotnet-v2/issues/861
I have been reading this article about reading the realtime database directly vs calling cloud functions that returns database data.
If I am returning a fairly large chunk of data e.g. a json object that holds data representing 50 user comments from a cloud function does this count
As Outbound Data (Egress) data? If so does this cost $0.12 per gb per month?
The comments are stored like so with an incremental key.
comments: [0 -> {text: “Adsadsads”},
1 -> {text: “dadsacxdg”},
etc.]
Furthermore, I have read you can call goOffline() and goOnline() using the client sdks to stop concurrent connections. Are there any costs associated with closing and
Opening database connections or is it just the speed aspect of opening a connection every time you read?
Would it be more cost effective to call a cloud function that returns the set of 50 comments or allow the devices to read the comments directly from the database
But open/close before/after each read to the database, using orderByKey(), once(), startAt() and limitToFirst()?
e.g something like this
ref(‘comments’).once().orderByKey().startAt(0).limitToFirst(50).
Thanks
If your Cloud Function reads data from Realtime Database and returns (part of) that data to the caller, you pay for the data that is read from the database (at $1/GB) and then also for the data that your Cloud Function returns to the user (at $0.12/GB).
Opening a connection to the database means data is sent from the database to the client, and you are charged for this data (typically a few KB).
Which one is more cost effective is something you can calculate once you have all parameters. I'd recommend against premature cost optimization though: Firebase has a pretty generous free tier on its Realtime Database, so I'd start reading directly from the database and seeing how much traffic that generates. Also: if you are explicitly managing the connection state, and seem uninterested in the realtime nature of Firebase, there might be better/cheaper alternatives than Firebase to fill your needs.
We have a business requirement to deprecate certain field values("**State**"). So we need to scan the entire db and find these deprecated field values and take the last record of that partition key(as there can be multiple records for the same partition key, sort key is LastUpdatedTimeepoch), then update the record. Right now the table contains around 600k records. What's the best way to do this without bringing down the db service in production?
I see this thread could help me
https://stackoverflow.com/questions/36780856/complete-scan-of-dynamodb-with-boto3
But my main concern is -
This is a one time activity. As this will take time, we cannot run this in AWS lambda since it will exceed 15 minutes. So where can I keep the code running for this?
Create EC2 instance and assign role to access dynamo db and run function in EC2 instance.
I created several new pipelines in Azure Data Factory to process CosmosDB Change Feed (which go into Blob storage for ADF processing to on-prem SQL Server), and I'd like to "resnap" the data from the leases collection to force a full re-sync. Is there a way to do this?
For clarity, my set-up is:
Change Feed ->
Azure Function to process the changes -> Blob Storage to hold the JSON documents -> Azure Data Factory which picks up the Blob Storage documents and maps them to on-prem SQL Server stored proc inserts/updates.
The easiest and simplest way is to do it is to simply delete the lease documents and make sure that the StartFromBeginning setting is set to true. Once restarted the change feed service will recreate the leases (if the appropriate setting is configured to true) and reprocess all the documents.
The other way to do so is to update every single lease document and reset the Continuation token "checkpoint" to null, however I don't recommend this method since you might accidentally miss a lease which can lead to issues.
Does it cost anything every time you drop/recreate a database in Cosmos DB (SQL API)?
Does it cost anything every time you drop/recreate a collection within a database in Cosmos DB (SQL API)?
Database, Collections, Offers, Documents etc all inherit from the same single object which is a Resource. Resources are basically a CosmosDB object which is represented in JSON.
Creating a collection or creating a database is essentially creating a Resource which is a document whose size doesn't exceed the 1kb so you will be charged the minimum price for the Create or Read of that data.
Keep in mind however that CosmosDB also charges you hourly per collection based on it's provisioned RUs. For example, if you create a collection even for a second, you will be charged the hourly rate of this collection's existence based on the provisioned RUs.
Based on the official doc, databases, users, permissions, collections ,documents and attachments are all resources. Just as what is mentioned in this doc, the billing unit for Cosmos DB is RUs. If you operate any resources in cosmos db , you will consume RUs so that costs will be incurred.
I tested code related to creating and dropping database via java cosmos db sdk.You could see the consumption of RUs with your operations.
public static void main(String[] args) throws Exception, DocumentClientException {
DocumentClient documentClient = new DocumentClient(END_POINT,
MASTER_KEY, ConnectionPolicy.GetDefault(),
ConsistencyLevel.Session);
Database database = new Database();
database.setId("hello");
ResourceResponse<Database> response = documentClient.createDatabase(database, null);
System.out.println(response.getRequestCharge());
ResourceResponse<Database> response1 = documentClient.deleteDatabase("dbs/hello", null);
System.out.println(response1.getRequestCharge());
}
Another details about price in cosmos db,please see this doc.
Hope it helps you.
Creating a Collection will incur one hour of billing at whatever RU/s throughput you have provisioned. Same as with scaling. IE if you scale up from 1000RU/s to 2000RU/s and then back down immediately you'll still be charged for one hour usage at 2000RU/s.
The Azure Pricing Calculator will let you break down the cost of a collection in hourly granularity based on the provisioned throughput.
Cosmos DB pricing completely depends on reserve RU per collection, You don't need to pay anything for creating Cosmos DB account and Database but when you will create a collection, minimum RU limit is 400 per collection, so if you use or not that collection you need to pay.