How to delete all data in a partition? - azure-cosmosdb

I have a CosmosDB collection with a number of different partitions. I want to delete all of the data in one of the partitions so I tried to run the command:
db.myCollection.deleteAll({PartitionKey: 'pop-9q'})
Where PartitionKey is the field that I partition/shard based on. But when I execute this it returns the not very helpful message:
ERROR: An Error has occurred
Why would I be getting this message and how can I either get more details on the cause or find a resolution?

Currently, at this time, you are unable to perform a bulk delete. Please Up Vote and Comment on this functionality: Add the ability to delete ALL data in a partition
Additionally, which API are you consuming? For Gremlin API you could execute something like the following: g.V().drop()

The Microsoft.Azure.Cosmos SDK has added this ability - currently only available as a preview feature (which requires you to opt-in via the portal)
See here for more details:
https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/how-to-delete-by-partition-key?tabs=dotnet-example
Sample code included there:
// Get reference to the container
var container = cosmosClient.GetContainer("DatabaseName", "ContainerName");
// Delete by logical partition key
ResponseMessage deleteResponse = await container.DeleteAllItemsByPartitionKeyStreamAsync(new PartitionKey("Contoso"));
if (deleteResponse.IsSuccessStatusCode) {
Console.WriteLine($"Delete all documents with partition key operation has successfully started");
}

As #Mike said, a "delete all data" feature is not supported yet in Cosmos db SQL API and Mongo API. I notice that you have already added comments in above link. I just provide you with a workaround here that using bulk delete stored procedure for Cosmos db SQL API.
(sample code: https://gist.github.com/deepumi/2a23c5380202bddf0b85e83baf5833be)
For Mongo API, unfortunately, even stored procedure is not supported. You could create an Azure HTTP Trigger Function to execute bulk delete code in the function whenever you want or merge it into your program code.

Related

CosmosDB Container without PartitionKey

I'm using Azure Cosmos DB .NET SDK Version 3.0 and I want to create container programmatically without partition key. Is it possible? I always got error saying Value cannot be null.
Parameter name: partitionKey
I use method CosmosContainers.CreateContainerIfNotExistsAsync
Reproduce your issue on my side always.
Notice the exception is caused by below method:
Try to deserialize the dll source code and find the detailed logical code.
It seems we can't cross this judgement so far because cosmos db team is planning to deprecate ability to create non-partitioned containers, as they do not allow you to scale elastically.(Mentioned in my previous case:Is it still a good idea to create comos db collection without partition key?)
But you still could create non-partitioned containers with DocumentDB .net package or REST API.

LINQ CosmosDB MongoDB API upsert E11000 duplicate key error collection

I have just started to try out cosmosdb with the mongodb api and my application is quite easy. It listens on a message queue and store that data in the database. This data might already be stored and needs to get updated so I do an upsert.
The problem is that on the update it fails with a duplicate key error. I have tried to read a bit about this but haven't found any documentation. What I did find out is that you shouldn't set the id when doing the update which I find hard to do.
This is the code I have:
await Ctx.ReplaceOneAsync(d => d.Id == importedData.Id, importedData, new UpdateOptions { IsUpsert = true });
And this is the error I get:
A write operation resulted in an error.
E11000 duplicate key error collection: test Failed _id or unique key constraint A bulk write operation resulted in one or more errors.
How do I do an update based on the id when using linq?

Determine if Cosmos DB NotFound due to missing collection vs. document

Is there a way to programmatically determine from a DocumentClientException where StatusCode == HttpStatusCode.NotFound whether it was the document, the collection, or the database that was not found?
I'm trying to figure out whether I can implement on-demand collection provisioning and only call DocumentClient.CreateDocumentCollectionIfNotExistsAsync when I need to. I'm trying to avoid calling it before making every request (presumably this adds an extra network roundtrip to every request). Likewise, I'm trying to avoid calling it on error recovery when I know it won't help.
From experimentation with the local emulator, the only field I see varying in these three cases is DocumentClientException.Error.Message, and only when the database cannot be found. I generally try to avoid exception dispatching based on human-readable messages.
Wrong database name:
StatusCode: HttpStatusCode.NotFound
Error.Message: {\"Errors\":[\"Owner resource does not exist\"]}...
Correct database name, wrong collection name:
StatusCode: HttpStatusCode.NotFound
Error.Message: {\"Errors\":[\"Resource Not Found\"]}...
Correct database name, correct collection name, incorrect document ID:
StatusCode: HttpStatusCode.NotFound
Error.Message: {\"Errors\":[\"Resource Not Found\"]}...
I'm planning to use a database with its own offer. Since collections inside a database with its own offer are cheap, I'm trying to see whether I can segregate each tenant in my multi-tenant application into its own collection. Each tenant ends up having a different indexing and default TTL policy. The set of collections is not fixed and changes dynamically during runtime as new tenants sign up. I cannot predict when I will need to add a new collection. There's no new tenant notification: I just get a request that I need to handle by creating a document in a possibly non-existent collection. There's a process to garbage collect unused collections.
I'm using the NuGet package Microsoft.Azure.DocumentDB.Core Version 1.9.1 in a .NET Core 2.1 app targeting a SQL API Cosmos DB instance.
If you look at the Message property in detail, you should see following strings that informs whether 404 Not Found response was generated due to Document vs Collection.
ResourceType: Document
ResourceType: Collection
It's not ideal but you can try to regex this information out of error message.

GCP encryption thru Beam / Dataflow APIs for Bigquery and Cloud SQL

Context: We are trying to load some CSV format data into GCP BigQuery using GCP Dataflow (Apache Beam). As a part of this for the first time (for each table) creating the BQ tables thru BigQueryIO API. One of the customer requirement is the data on GCP needs to be encrypted using Customer supplied/managed Encryption keys.
Problem Statement: We are not able to find any way to specify the "Custom Encryption Keys" thru APIs while creating Tables. The GCP documentation details about how to specify the Custom encryption keys thru GCP BQ Console but could not find anything for specifying it thru APIs from within DataFlow Code.
Code Snippet:
String tableSpec = new StringBuilder().append(PipelineConstants.PROJECT_ID).append(":")
.append(dataValue.getKey().target_dataset).append(".").append(dataValue.getKey().target_table_name)
.toString();
ValueProvider<String> valueProvider = StaticValueProvider.of("gs://bucket/folder/");
dataValue.getValue().apply(Count.globally()).apply(ParDo.of(new RowCount(dataValue.getKey())))
.apply(ParDo.of(new SourceAudit(runId)));
dataValue.getValue().apply(ParDo.of(new PreProcessing(dataValue.getKey())))
.apply(ParDo.of(new FixedToDelimited(dataValue.getKey())))
.apply(ParDo.of(new CreateTableRow(dataValue.getKey(), runId, timeStamp)))
.apply(BigQueryIO.writeTableRows().to(tableSpec)
.withSchema(CreateTableRow.getSchema(dataValue.getKey()))
.withCustomGcsTempLocation(valueProvider)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));
Query: If anybody could let us know
If this is possible to provide encryption key thru Beam API?
If its not possible with the current version what could be the possible work
around?
Kindly let know if additional information is required.
Customer supplied encryption keys is a new feature, not all libraries have been updated to support it yet.
If you know the table name in advance, you can use UI/CLI or API to create table, then run your normal flow to load data into that table. That might be a work around for you.
https://cloud.google.com/bigquery/docs/customer-managed-encryption#create_table
API to create table: https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/insert
You need to set this section on table object:
"encryptionConfiguration": {
"kmsKeyName": string
}
More details on table: https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#resource

Symfony - Log runnables natives queries when database is out

I'am working on a Symfony app that provides a rest web service (simple HTTP Request with JSON).
That service check some rules and inserts few lines in two MySQL table (write only).
For optimize reason, even if Doctrine bundle is available, i use native MySQL Query (with bind params) to insert this lines.
My need is : If for any reason, the database is not available, write "runnables" queries into a log file.
The final purpose is that when database is back, i want to be able to execute directly the file's content on the database.
Note that there is no unique constraint (pk is a generated uuid) and no lock or transaction to handle (simple insert statements).
I write a custom SQLLogger, but when $connection->insert(...) is called, the connect fail before logger is called.
So, my question is : There is a way to get the final query (with binded parameters) without database connection ?
Or should i rewrite the mecanism that bind params into query and log it myself when database is not available ?
Best regards,
Julien
As the final query with parameters is build by the database, there is just no way to build the query with PHP and to be garanteed that the query will be the same as the database.
The only way si to build query without binded parameters, but this is clearly not a good practice.
So, i finally decided to store all the JSON (API request body) in a file if the database is not available.
So when the database is back, instead of replay SQL queries, i can replay the original HTTP query.
Hope this late self-anwser will help someone.
Best regards.

Resources