Azure Cosmos DB - Gremlin API to clone existing collection into another collection - azure-cosmosdb

I have created a gremlin api database in Azure Cosmos DB and have data in one collection.
However, I want to know if there is a way to clone the data into another collection in another database.
I want to copy graph data from Dev environment to stage and prod environments.

You can use existing tools for cosmos SQL API(earlier known as documentdb), cosmosdb allows you to query graph via sql API as well
Something like "select * from c" can fetch you the json representation of how cosmosdb stores your graph data.
The simplest approach would be using cosmosdb migration tool:
Set input source as Cosmos SQL API/Documentdb, and use your dev endpoint with the following query select * from c
Set output type as json and export your data
Now use the downloaded json as input source and set your prod graph db as your output(choose documentdb/cosmos SQL API as output type) and run it.
This should push your dev graph data to prod.
You can also use other Azure tools such as data factory, which work with documentdb

Just used this CosmicClone to clone a cosmos db graph database form one account to another https://github.com/microsoft/CosmicClone. Cloned 500k records in 20mins. Looks like it would work with a DB to clone a collection.

Related

CosmosDB Zone Redundancy using Azure Libraries for Net

I currently create a CosmosDB with the following properties:
cosmosDb = await azure.CosmosDBAccounts
.Define(cosmosDbResource.Name)
.WithRegion(cosmosDbResource.Region)
.WithExistingResourceGroup(cosmosDbResource.ResourceGroup.Name)
.WithKind(DatabaseAccountKind.GlobalDocumentDB)
.WithStrongConsistency()
.WithTags(cosmosDbResource.ResourceGroup.Tags)
.CreateAsync();
The only place I have seen to be able to set Zone Redundancy on is the ReadReplication database, like so:
cosmosDb = await azure.CosmosDBAccounts
.Define(cosmosDbResource.Name)
.WithRegion(cosmosDbResource.Region)
.WithExistingResourceGroup(cosmosDbResource.ResourceGroup.Name)
.WithKind(DatabaseAccountKind.GlobalDocumentDB)
.WithStrongConsistency()
.WithReadReplication(Region.USEast, true)
.WithTags(cosmosDbResource.ResourceGroup.Tags)
.CreateAsync();
The problem is that I don't care about a Read Replication database. I want to set Zone Redundancy on the initial database I create. I noticed that in the Azure Portal when I create a CosmosDB manually, it gives me the option to set Zone Redundancy. Is this not possible via the Azure Libraries for NET SDK?
To specify write region with Zone Redundancy do this below:
.WithWriteReplication(Region.USWest2, true)
PS: If at all possible I would recommend you use the Auto-rest generated version of this SDK. The fluent API is not generally as up to date as the Auto-rest generated API's. This gets built directly off our the Cosmos DB swagger spec and everything downstream is built upon this including ARM, PowerShell and CLI.
There is a repository with a fairly complete set of examples as well that you can use to help build your own management libraries. It also includes fluent samples but also out of date. Cosmos DB Samples
This is the repo for the Auto-rest generated SDK. Cosmos DB Management SDK for .NET

Can Cosmos DB read data from File Blob or Csv or Json file at a batch size?

I am currently researching around reading data using cosmos db, basically our current approach is using a .Net Core C# application with Cosmos DB SDK to read entire data from a file blob or csv or json file, and then use the for loop, one by one pulling its information from cosmos db and compare/insert/update, this somehow feels inefficient.
We're curious if cosmos DB could perform the ability to read a bunch of data (let's say a batch size of 5000 records) from file blob or csv or json file and similar like SQL server, do a bulk insert or merge statement within the cosmos DB directly? Basically the goal is not doing same operation one by one for each item interacting with cosmos DB.
I've noticed and researched in BulkExecutor as well, the BulkUpdate looks like a more straightforward way of directly updating an item without considering if it should update. In my case for example, if I have 1000 items, only 300 items' properties got changed, so I'll just need to update those 300 items without updating the irrelevant remaining 700 items as well. Basically I need to find out a way to have Cosmos DB do the data compare as in a collection, not inside a loop and focus on each single item, it could either perform a update or output a collection that I can use for later updating as well.
Would the (.Net + SDK) application be able to perform that or would a cosmos DB stored procedure could handle similar job? Any other Azure tool is welcome as well!
What you are looking for is the Cosmos DB Bulk Executor library
It is designed to operate using millions of records in bulk and it is very efficient.
You can find the .NET documentation here

How to delete all data in a partition?

I have a CosmosDB collection with a number of different partitions. I want to delete all of the data in one of the partitions so I tried to run the command:
db.myCollection.deleteAll({PartitionKey: 'pop-9q'})
Where PartitionKey is the field that I partition/shard based on. But when I execute this it returns the not very helpful message:
ERROR: An Error has occurred
Why would I be getting this message and how can I either get more details on the cause or find a resolution?
Currently, at this time, you are unable to perform a bulk delete. Please Up Vote and Comment on this functionality: Add the ability to delete ALL data in a partition
Additionally, which API are you consuming? For Gremlin API you could execute something like the following: g.V().drop()
The Microsoft.Azure.Cosmos SDK has added this ability - currently only available as a preview feature (which requires you to opt-in via the portal)
See here for more details:
https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/how-to-delete-by-partition-key?tabs=dotnet-example
Sample code included there:
// Get reference to the container
var container = cosmosClient.GetContainer("DatabaseName", "ContainerName");
// Delete by logical partition key
ResponseMessage deleteResponse = await container.DeleteAllItemsByPartitionKeyStreamAsync(new PartitionKey("Contoso"));
if (deleteResponse.IsSuccessStatusCode) {
Console.WriteLine($"Delete all documents with partition key operation has successfully started");
}
As #Mike said, a "delete all data" feature is not supported yet in Cosmos db SQL API and Mongo API. I notice that you have already added comments in above link. I just provide you with a workaround here that using bulk delete stored procedure for Cosmos db SQL API.
(sample code: https://gist.github.com/deepumi/2a23c5380202bddf0b85e83baf5833be)
For Mongo API, unfortunately, even stored procedure is not supported. You could create an Azure HTTP Trigger Function to execute bulk delete code in the function whenever you want or merge it into your program code.

How do you connect to a Cosmos Db (primarily updated via SQL API) using Gremlin.Net ? (can you?)

Im working on a Cosmos DB app that stores both standard documents and graph documents. We are saving both types via the documentdb api and I am able to run graph queries that return Graphson using the DocumentClient.CreateGremlinQuery method. This graphson is to be read by a web app and the graph displayed for user viewing and so on.
My issue is that I cannot define the version of the Graphson format returned when using the Microsoft.Azure.Graphs method. So i looked into Gremlin.net and that has a lot more options in this regard from the documentation.
However I am finding connecting to the Cosmos Document Db using gremlin.net difficult. The server variable which you define like this :
var server = new GremlinServer("https://localhost/",8081 , enableSsl: true, username: $"/dbs/TheDatabase/colls/TheCOllection", password: "C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==");
then results in a uri that has "/gremlin" and it cannot locate the database end point.
Has anyone used Gremlin.net to connect to a Cosmos document database (not a Cosmos db configured as a graph db) that has been setup as a document db not a graph db ? The documents in it are graph/gremlin compatible in their format with _isEdge / label / _sink etc.
Cheers,
Mark (Document db/Gremlin/graph newbie)

DocumentDB Data Migration Tool - transformDocument procedure with partitions

I am trying to convert data from SQL Server to DocumentDb. I need to create embedded arrays in the DocumentDb document.
I am using the DocumentDb Data Migration Tool and it describes using the transformDocument for a bulk insert stored proc...unfortunately we are using partitioned collections and they do not support bulk insert.
Am I missing something or is this not currently supported?
The migration tool only supports sequential data import to a partitioned collection. Please follow sample below to bulk-import data efficiently into a partition collection.
https://github.com/Azure/azure-documentdb-dotnet/tree/master/samples/documentdb-benchmark

Resources