Bulk delete support in Cosmos DB using .NET SDK - azure-cosmosdb

Based on documentation related to the cosmos db bulk executer(https://learn.microsoft.com/en-us/azure/cosmos-db/bulk-executor-dot-net), there is support for a bulk delete via the bulk executer.
However, the examples under the new bulk support within the .NET SDK (https://devblogs.microsoft.com/cosmosdb/introducing-bulk-support-in-the-net-sdk/) does not explicitly state anything about deletion
I wanted to understand if there were any drawbacks to attempting a delete on several documents using the new bulk execution support (here: https://devblogs.microsoft.com/cosmosdb/introducing-bulk-support-in-the-net-sdk/), or if it is okay to proceed with using a similar pattern as the "Create" flow described in the sample.

When Bulk mode is enabled, any point operation (ReadItem, CreateItem, UpsertItem, DeleteItem, ReplaceItem) will benefit from it, just follow the same pattern of the concurrent Tasks but instead of CreateItem, DeleteItem (or you could even mix different operation types).

Related

Does aws appsync have scan operations to scan dynamoDB

I am building a serveless web app with aws amplify - graphql - dynamodb. I want to know what exactly a scan operation is in this context. For example, I have an User table and queries listUsers and getUser were generated from amplify schema. Are they scan operations or queries?
Thank you for your answers in advance as I could only find the definition of a scan operation but there aren't example for me to identify one when it comes to graphql.
Amplify uses Filter Expressions which are a type of Query.
You can see this yourself by looking at the .vtl files that amplify generates and uploads to appsync.
They are located here: amplify/#current-cloud-backend/api/[API NAME]/build/resolvers
In that folder you can open up one of the Query.list[Model].req.vtl. Even if you are not familiar with Velocity Template Language you can still get the idea. You can see that it uses the expression $util.transform.toDynamoDBFilterExpression.
More info about that util and then looking at the docs for toDynamoDBFilterExpression.

Using java to stream index into elastic-search from dynamodb using trigger

I am a novice on dynamodb and elasticsearch.
Need to stream indexes into elastic search from dynamodb table by trigger using Java i.e. whenever a new record is inserted in dynamodb table the same has to updated in the elastic-search.
Most of the examples available in web are either incomplete or implemented in python/nodejs. If there can be any explanation on how to achieve this in Java or any links/reference articles are also welcome.

Can Cosmos DB read data from File Blob or Csv or Json file at a batch size?

I am currently researching around reading data using cosmos db, basically our current approach is using a .Net Core C# application with Cosmos DB SDK to read entire data from a file blob or csv or json file, and then use the for loop, one by one pulling its information from cosmos db and compare/insert/update, this somehow feels inefficient.
We're curious if cosmos DB could perform the ability to read a bunch of data (let's say a batch size of 5000 records) from file blob or csv or json file and similar like SQL server, do a bulk insert or merge statement within the cosmos DB directly? Basically the goal is not doing same operation one by one for each item interacting with cosmos DB.
I've noticed and researched in BulkExecutor as well, the BulkUpdate looks like a more straightforward way of directly updating an item without considering if it should update. In my case for example, if I have 1000 items, only 300 items' properties got changed, so I'll just need to update those 300 items without updating the irrelevant remaining 700 items as well. Basically I need to find out a way to have Cosmos DB do the data compare as in a collection, not inside a loop and focus on each single item, it could either perform a update or output a collection that I can use for later updating as well.
Would the (.Net + SDK) application be able to perform that or would a cosmos DB stored procedure could handle similar job? Any other Azure tool is welcome as well!
What you are looking for is the Cosmos DB Bulk Executor library
It is designed to operate using millions of records in bulk and it is very efficient.
You can find the .NET documentation here

CosmosDB Container without PartitionKey

I'm using Azure Cosmos DB .NET SDK Version 3.0 and I want to create container programmatically without partition key. Is it possible? I always got error saying Value cannot be null.
Parameter name: partitionKey
I use method CosmosContainers.CreateContainerIfNotExistsAsync
Reproduce your issue on my side always.
Notice the exception is caused by below method:
Try to deserialize the dll source code and find the detailed logical code.
It seems we can't cross this judgement so far because cosmos db team is planning to deprecate ability to create non-partitioned containers, as they do not allow you to scale elastically.(Mentioned in my previous case:Is it still a good idea to create comos db collection without partition key?)
But you still could create non-partitioned containers with DocumentDB .net package or REST API.

cosmosdb - archive data older than n years into cold storage

I researched several places and could not find any direction on what options are there to archive old data from cosmosdb into a cold storage. I see for DynamoDb in AWS it is mentioned that you can move dynamodb data into S3. But not sure what options are for cosmosdb. I understand there is time to live option where the data will be deleted after certain date but I am interested in archiving versus deleting. Any direction would be greatly appreciated. Thanks
I don't think there is a single-click built-in feature in CosmosDB to achieve that.
Still, as you mentioned appreciating any directions, then I suggest you consider DocumentDB Data Migration Tool.
Notes about Data Migration Tool:
you can specify a query to extract only the cold-data (for example, by creation date stored within documents).
supports exporting export to various targets (JSON file, blob
storage, DB, another cosmosDB collection, etc..),
compacts the data in the process - can merge documents into single array document and zip it.
Once you have the configuration set up you can script this
to be triggered automatically using your favorite scheduling tool.
you can easily reverse the source and target to restore the cold data to active store (or to dev, test, backup, etc).
To remove exported data you could use the mentioned TTL feature, but that could cause data loss should your export step fail. I would suggest writing and executing a Stored Procedure to query and delete all exported documents with single call. That SP would not execute automatically but could be included in the automation script and executed only if data was exported successfully first.
See: Azure Cosmos DB server-side programming: Stored procedures, database triggers, and UDFs.
UPDATE:
These days CosmosDB has added Change feed. this really simplifies writing a carbon copy somewhere else.

Resources