cosmosdb sql api vs mongodb api which one to use for my scenario. - azure-cosmosdb

I have a document called "chat"
"Chat": [
{
"User": {},
"Message": "i have a question",
"Time": "06:55 PM"
},
{
"User": {},
"Message": "will you be able to ",
"Time": "06:25 PM"
},
{
"User": {},
"Message": "ok i will do that",
"Time": "07:01 PM"
}
every time a new chat message arrives i should be able to simple append to this array.
mongodb API aggregation pipeline (preview) allows me to use things like $push $addToSet for that
if i use sql api i will have to pull the entire document every time modify it and create a new document every time.
Other Considerations :
This array can grow rapidly.
This "chat" document might also be nested into other document as well.
My Question
Does this means that mongodb API is better suited for this and sql api will have a performance hit for this scenario ?

Does this means that mongodb API is better suited for this and sql api
will have a performance hit for this scenario ?
It's hard to say which database is the best choice.
Yes,as found in the doc, Cosmos Mongo API supports $push and $addToSet which is more efficient. However,in fact, Cosmos Mongo API just supports a subset of the MongoDB features and translates requests into the Cosmos sql equivalent. So, maybe Cosmos Mongo API has some different behaviours and results. But the onus is on Cosmos Mongo API to improve their emulation of MongoDB.
When it comes to Cosmos Sql Api, partial update is not supported so far but it is hitting the road. You could commit feedback here. Currently, you need to update the entire document. Surely, you could use stored procedure to do this job to release pressure of your client side.
The next thing I want to say, which is the most important, is the limitation mentioned by #David. The document size has 2MB limitation in sql api and 4MB in mongo api:What is the size limit of a cosmosdb item?. Since your chat data is growing, you need to consider to split them. Then give the documents a partition key such as "type": "chatdata" to classify them.

Related

Determine if Cosmos DB NotFound due to missing collection vs. document

Is there a way to programmatically determine from a DocumentClientException where StatusCode == HttpStatusCode.NotFound whether it was the document, the collection, or the database that was not found?
I'm trying to figure out whether I can implement on-demand collection provisioning and only call DocumentClient.CreateDocumentCollectionIfNotExistsAsync when I need to. I'm trying to avoid calling it before making every request (presumably this adds an extra network roundtrip to every request). Likewise, I'm trying to avoid calling it on error recovery when I know it won't help.
From experimentation with the local emulator, the only field I see varying in these three cases is DocumentClientException.Error.Message, and only when the database cannot be found. I generally try to avoid exception dispatching based on human-readable messages.
Wrong database name:
StatusCode: HttpStatusCode.NotFound
Error.Message: {\"Errors\":[\"Owner resource does not exist\"]}...
Correct database name, wrong collection name:
StatusCode: HttpStatusCode.NotFound
Error.Message: {\"Errors\":[\"Resource Not Found\"]}...
Correct database name, correct collection name, incorrect document ID:
StatusCode: HttpStatusCode.NotFound
Error.Message: {\"Errors\":[\"Resource Not Found\"]}...
I'm planning to use a database with its own offer. Since collections inside a database with its own offer are cheap, I'm trying to see whether I can segregate each tenant in my multi-tenant application into its own collection. Each tenant ends up having a different indexing and default TTL policy. The set of collections is not fixed and changes dynamically during runtime as new tenants sign up. I cannot predict when I will need to add a new collection. There's no new tenant notification: I just get a request that I need to handle by creating a document in a possibly non-existent collection. There's a process to garbage collect unused collections.
I'm using the NuGet package Microsoft.Azure.DocumentDB.Core Version 1.9.1 in a .NET Core 2.1 app targeting a SQL API Cosmos DB instance.
If you look at the Message property in detail, you should see following strings that informs whether 404 Not Found response was generated due to Document vs Collection.
ResourceType: Document
ResourceType: Collection
It's not ideal but you can try to regex this information out of error message.

How do you connect to a Cosmos Db (primarily updated via SQL API) using Gremlin.Net ? (can you?)

Im working on a Cosmos DB app that stores both standard documents and graph documents. We are saving both types via the documentdb api and I am able to run graph queries that return Graphson using the DocumentClient.CreateGremlinQuery method. This graphson is to be read by a web app and the graph displayed for user viewing and so on.
My issue is that I cannot define the version of the Graphson format returned when using the Microsoft.Azure.Graphs method. So i looked into Gremlin.net and that has a lot more options in this regard from the documentation.
However I am finding connecting to the Cosmos Document Db using gremlin.net difficult. The server variable which you define like this :
var server = new GremlinServer("https://localhost/",8081 , enableSsl: true, username: $"/dbs/TheDatabase/colls/TheCOllection", password: "C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==");
then results in a uri that has "/gremlin" and it cannot locate the database end point.
Has anyone used Gremlin.net to connect to a Cosmos document database (not a Cosmos db configured as a graph db) that has been setup as a document db not a graph db ? The documents in it are graph/gremlin compatible in their format with _isEdge / label / _sink etc.
Cheers,
Mark (Document db/Gremlin/graph newbie)

How to improve the performance when copying data from cosmosdb?

I am now trying to copy data from cosmosdb to data lake store by data factory.
However, the performance is poor, about 100KB/s, and the data volume is 100+ GB, and keeps increasing. It will take 10+ days to finish, which is not acceptable.
Microsoft document https://learn.microsoft.com/en-us/azure/data-factory/data-factory-copy-activity-performance mentioned that the max speed from cosmos to data lake store is 1MB/s. Even this, the performance is still bad for us.
The cosmos migration tool doesn't work, no data exported, and no issue log.
Data lake analytics usql can extract external sources, but currently only Azure DB/DW and SQL Server are supported, no cosmosdb.
How/what tools can improve the copy performance?
According to your description, I suggest you could try to set high cloudDataMovementUnits to improve the performance.
A cloud data movement unit (DMU) is a measure that represents the power (a combination of CPU, memory, and network resource allocation) of a single unit in Data Factory. A DMU might be used in a cloud-to-cloud copy operation, but not in a hybrid copy.
By default, Data Factory uses a single cloud DMU to perform a single Copy Activity run. To override this default, specify a value for the cloudDataMovementUnits property as follows. For information about the level of performance gain you might get when you configure more units for a specific copy source and sink, see the performance reference.
Notice: Setting of 8 and above currently works only when you copy multiple files from Blob storage/Data Lake Store/Amazon S3/cloud FTP/cloud SFTP to Blob storage/Data Lake Store/Azure SQL Database.
So the max DMU you could set is 4.
Besides, if this speed doesn't match your current requirement.
I suggest you could write your own logic to copy the documentdb to data lake.
You could create multiple webjobs which could use parallel copy from the documentdb to data lake.
You could convert the document according to index range or partition, then you could make each webjob copy different part. In my opinion, this will be faster.
About the dmu, can I use it directly or should I apply for it first? The web jobs you mean is dotnet activity? Can you give some more details?
As far as I know, you could directly use the dmu, you could directly add the dmu value in the json file as below:
"activities":[
{
"name": "Sample copy activity",
"description": "",
"type": "Copy",
"inputs": [{ "name": "InputDataset" }],
"outputs": [{ "name": "OutputDataset" }],
"typeProperties": {
"source": {
"type": "BlobSource",
},
"sink": {
"type": "AzureDataLakeStoreSink"
},
"cloudDataMovementUnits": 32
}
}
]
The webjob which could run programs or scripts in WebJobs in your Azure App Service web app in three ways: on demand, continuously, or on a schedule.
That means you could write a C# program(or using other code language) to run the programs or scripts to copy the data from documentdb to data lake(all of the logic should be written by yourself).

Is there a way to get a webhook called on every Firebase Realtime DB transaction?

Basically, I'm using Firebase as my realtime database from my iOS application. However, I'd love to get a "Transaction log" to my server, even if it is not realtime. Is there a way to set this up? Maybe with a webhook?
You can use Cloud Functions for Firebase to write a database trigger that runs some JavaScript (running in a node.js environment) whenever data in your database changes. You can effectively use this to send changes from your Realtime Database to whatever other server you control. You would probably have to implement a webhook or some other endpoint on your server to receive the data.
In order to make outbound network requests in a function like this, you will need to upgrade your project to the Blaze plan if you haven't already.
I can't understand very well your question, what are you trying to achieve?
You can "save" an action done on your app with firebase in , at least, three ways:
1)Creating a node to log the transactions and use childupdates to save a transaction in its node and at the same time in another node like "transactionsLog" (https://firebase.google.com/docs/database/ios/read-and-write#update_specific_fields)
2)If you only need to track events done in your app by users, you can use firebase Analytics and save an event every time there is a transaction: https://firebase.google.com/docs/analytics/ios/start (get started) and https://firebase.google.com/docs/analytics/ios/start#log_events (to log events)
3)To do more complex actions when a transaction has been done you can use a Firebase Function (https://firebase.google.com/docs/functions/) where you can do, in javascript (it is a nodeJs environment ), everything you want.
Example first method (Swift):
let key = ref.child("transactions").childByAutoId().key
let post = ["uid": userID,
"author": username,
"amount": amount,
"date": date]
let childUpdates = ["/transactions/\(key)": post,
"/transactionsLog/\(userID)/\(key)/": true]
ref.updateChildValues(childUpdates)

Serverless framework with DynamoDB: Lambda function work, but data hasn't been saved to DynamoDB

I have dealing with some trouble serverless framework and DynamoDB.
After my lambda function executed,
the context.succeed(result) would return the result,
but nothing write into the DynamoDB.
Here is the link of demo repo.
I've read this question
And I added the resource to the s-resources-cf.json,
then serverless resources deploy again.
After sending the request, it still do nothing with DynamoDB.
Here's the thing I've done:
Create a table: posts with primary key in specific region
Attach AdministratorAccess to my IAM role(I know it's bad to do that.)
Add {"Effect": "Allow", "Action": ["*"], "Resource":"arn:aws:dynamodb:${region}:*:table/*"} to the s-resources-cf.json
Do there anything I still misunderstand?
Your demo repo does not appear to be including the AWS SDK & setting the region as noted in the Getting Started guide. I.e.:
var AWS = require("aws-sdk");
var DOC = require("dynamodb-doc");
AWS.config.update({region: "us-west-1"});
var docClient = new DOC.DynamoDB();
...
Note that dynamo-doc was deprecated almost a year ago. You may want to try the DynamoDB DocumentClient instead. This updated API has much more clear error-handling semantics that will probably help point out where the problem is.

Resources