DocumentDB Data Migration Tool - transformDocument procedure with partitions - azure-cosmosdb

I am trying to convert data from SQL Server to DocumentDb. I need to create embedded arrays in the DocumentDb document.
I am using the DocumentDb Data Migration Tool and it describes using the transformDocument for a bulk insert stored proc...unfortunately we are using partitioned collections and they do not support bulk insert.
Am I missing something or is this not currently supported?

The migration tool only supports sequential data import to a partitioned collection. Please follow sample below to bulk-import data efficiently into a partition collection.
https://github.com/Azure/azure-documentdb-dotnet/tree/master/samples/documentdb-benchmark

Related

How to ingest CSV data into kusto

I need to ingest the csv data into kusto using kusto query for geographical map visualization. but I couldn't find any query to ingest csv data. please help me on this.
Thank you
Try the One-Click ingestion wizard or ingest-from-storage commands (requires data to be in Azure Blob or ADLSv2 file.
You can ingest using query as well as described here: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/management/data-ingestion/ingest-from-storage
You will need to place your files in Azure Blob Storage or Azure Data Lake Store Gen 2.
However, one click ingestion wizard method is advisable as that will ingest data via Data Management component that offers better queuing and ingestion orchestration instead of ingesting directly into database/table using the command shared above.

Using java to stream index into elastic-search from dynamodb using trigger

I am a novice on dynamodb and elasticsearch.
Need to stream indexes into elastic search from dynamodb table by trigger using Java i.e. whenever a new record is inserted in dynamodb table the same has to updated in the elastic-search.
Most of the examples available in web are either incomplete or implemented in python/nodejs. If there can be any explanation on how to achieve this in Java or any links/reference articles are also welcome.

Can Cosmos DB read data from File Blob or Csv or Json file at a batch size?

I am currently researching around reading data using cosmos db, basically our current approach is using a .Net Core C# application with Cosmos DB SDK to read entire data from a file blob or csv or json file, and then use the for loop, one by one pulling its information from cosmos db and compare/insert/update, this somehow feels inefficient.
We're curious if cosmos DB could perform the ability to read a bunch of data (let's say a batch size of 5000 records) from file blob or csv or json file and similar like SQL server, do a bulk insert or merge statement within the cosmos DB directly? Basically the goal is not doing same operation one by one for each item interacting with cosmos DB.
I've noticed and researched in BulkExecutor as well, the BulkUpdate looks like a more straightforward way of directly updating an item without considering if it should update. In my case for example, if I have 1000 items, only 300 items' properties got changed, so I'll just need to update those 300 items without updating the irrelevant remaining 700 items as well. Basically I need to find out a way to have Cosmos DB do the data compare as in a collection, not inside a loop and focus on each single item, it could either perform a update or output a collection that I can use for later updating as well.
Would the (.Net + SDK) application be able to perform that or would a cosmos DB stored procedure could handle similar job? Any other Azure tool is welcome as well!
What you are looking for is the Cosmos DB Bulk Executor library
It is designed to operate using millions of records in bulk and it is very efficient.
You can find the .NET documentation here

Azure Cosmos DB - Gremlin API to clone existing collection into another collection

I have created a gremlin api database in Azure Cosmos DB and have data in one collection.
However, I want to know if there is a way to clone the data into another collection in another database.
I want to copy graph data from Dev environment to stage and prod environments.
You can use existing tools for cosmos SQL API(earlier known as documentdb), cosmosdb allows you to query graph via sql API as well
Something like "select * from c" can fetch you the json representation of how cosmosdb stores your graph data.
The simplest approach would be using cosmosdb migration tool:
Set input source as Cosmos SQL API/Documentdb, and use your dev endpoint with the following query select * from c
Set output type as json and export your data
Now use the downloaded json as input source and set your prod graph db as your output(choose documentdb/cosmos SQL API as output type) and run it.
This should push your dev graph data to prod.
You can also use other Azure tools such as data factory, which work with documentdb
Just used this CosmicClone to clone a cosmos db graph database form one account to another https://github.com/microsoft/CosmicClone. Cloned 500k records in 20mins. Looks like it would work with a DB to clone a collection.

cosmosdb - archive data older than n years into cold storage

I researched several places and could not find any direction on what options are there to archive old data from cosmosdb into a cold storage. I see for DynamoDb in AWS it is mentioned that you can move dynamodb data into S3. But not sure what options are for cosmosdb. I understand there is time to live option where the data will be deleted after certain date but I am interested in archiving versus deleting. Any direction would be greatly appreciated. Thanks
I don't think there is a single-click built-in feature in CosmosDB to achieve that.
Still, as you mentioned appreciating any directions, then I suggest you consider DocumentDB Data Migration Tool.
Notes about Data Migration Tool:
you can specify a query to extract only the cold-data (for example, by creation date stored within documents).
supports exporting export to various targets (JSON file, blob
storage, DB, another cosmosDB collection, etc..),
compacts the data in the process - can merge documents into single array document and zip it.
Once you have the configuration set up you can script this
to be triggered automatically using your favorite scheduling tool.
you can easily reverse the source and target to restore the cold data to active store (or to dev, test, backup, etc).
To remove exported data you could use the mentioned TTL feature, but that could cause data loss should your export step fail. I would suggest writing and executing a Stored Procedure to query and delete all exported documents with single call. That SP would not execute automatically but could be included in the automation script and executed only if data was exported successfully first.
See: Azure Cosmos DB server-side programming: Stored procedures, database triggers, and UDFs.
UPDATE:
These days CosmosDB has added Change feed. this really simplifies writing a carbon copy somewhere else.

Resources