Handling DocumentClientException with BulkImport - azure-cosmosdb

I am using Microsoft.Azure.CosmosDB.BulkExecutor.IBulkExecutor.BulkImportAsync to insert documents as a batch. I have implemented unique constraints for my cosmos db collection. If any of the input documents violates the constraint the entire bulk import operation fails with throwing DocumentClientException. Is this an expected behaviour? Or is there a way we can handle the exceptions for failed documents and make sure the valid documents are inserted?

First of all Thanks to Microsoft Document which has explained solid scenarios on the issue,
https://learn.microsoft.com/en-us/azure/data-factory/connector-troubleshoot-guide
This error appears when we define unique_key in addition to default id field defined by cosmos. The reason could be possible duplication of row for Unique Key in the dataset. Another possible reason, the Delta dataset which we are about to load has some of the unique keys which are already present in existing cosmos dataset.
For regular batch jobs there could be some updates happening on existing unique key itself, but we cannot update an existing unique key through batch process. As each record gets into cosmos as new record with new 'id' field value. Cosmos updates an existing record only same id field not on unique key.
Workaround, Since unique key is already going to be unique for every row across entire collection, we can define our unique value itself as also 'id' field. So now if we have any updates on addition field apart from unique key we can update them as 'id' field for respective unique key will also be same.
In SQL way,
SELECT <unique_key_field> AS id, <unique_key_field>, field1, field2 FROM <table_name>

Related

Creating index with unique constraint for new blank field

A new column was added to an existing DB table (PA0023).
DB: HANA
The column should be unique, therefore i tried to create a unique index constraint via SE11.
Activation succeeded. However, while creating the index via Utilities... Database Utility, an error showed up:
Request: Create Index PA0023-Z01
...
sql:
CREATE UNIQUE INDEX 'PA0023~Z01' ON 'PA0023'
('MANDT',
'RECORD_KEY')
cannot CREATE UNIQUE INDEX; duplicate key found [5] Several documents with the same ID exist in the index;SAPABAP1:PA0023.$uc_PA0023~Z01$ content not unique, cannot define unique constraint. rowCount != distinctCount.
There aren't rows with the same value filled in that column. There are rows with blank value, which are considered duplicates. After replacing blanks in development environment, index was created well. It's less possible in production, because there are many records with an empty value in that new field.
So my question is: Is there a way to create the unique constraint without replacing the blanks?
You cannot create a unique constraint if the existing data does not provide uniqueness. So no you can't do this if you have multiple NULL values for the key. You would need to ensure the data is unique before creating the constraint.
This is normal database practice, it's not HANA specific.
While it is true that a compound primary key cannot contain any nullable columns it is permitted for a compound unique/candidate key to be defined with nullable columns. The only golden rule is that when adding or updating a record if any column in the unique key contains a NULL value then the index entry is NOT written to the database.
MySQL does this by default.
SQL Server will do this provided that you add "WHERE columnX IS NOT NULL" to the key's definition.
ORACLE is the same as SQL Server, except that the syntax is more complicated.

How to fetch multiple rows from DynamoDB using a non primary key

select * from tableName where columnName="value";
How can I fetch a similar result in DynamoDB using java, without using primary key as my attribute (Need to group data based on a value for a particular column).
I have gone through articles regarding getbatchitems, QuerySpec but all these require me to pass the primary key.
Can someone give a lead here?
Short answer is you can't. Whenever you use the Query or GetItem operations in DynamoDB you must always supply the table or index primary key.
You have two options:
Perform a Scan operation on the table and filter by columnName="value". However this requires DynamoDB to look at every item in the table so it is likely to be slow and expensive.
Add a Global Secondary Index to your table. This will require you to define a primary key for the index that contains the columnName you want to query

How to maintain different version JSON document in cosmos db when PK is id and UK is id and version

I have a JSON document with two properties deviceIdentity, version.
Partition Key for my collection is deviceIdentity.
My JSON documents comes with different versions I want to keep all versions of this document.
Like:
deviceIdentity1, v1
deviceIdentity1, v2
Two documents should be there.
Problem is since my PK is deviceIdentity, it is always updating the existing record even though I have defined a unique key constraint on deviceIdentity, version.
enter image description here
Any pointers will be of help!
I believe you are confusing partition key with primary key.
Partition key determines how data is scaled horizontally. This should not be unique as otherwise any read except exact document lookup would require to scan all partitions, which would be innefficient. In your case deviceIdentity may be a suitable candidate - all versions of the same device would fall to same partition.
Primary key is your document identity (the field id). As you already noticed, there can be only 1 document with given id. The id field MUST be unique per document you want to store. In your case, you could use a combination values like "deviceIdentity1, v2" as the identity. Or, you could use technical unique id, like a guid.
Also, note that by Unique keys in Azure Cosmos DB:
By creating a unique key policy when a container is created, you ensure the uniqueness of one or more values per partition key.
Meaning if your partition key is deviceIdentity then you don't have to duplicate the deviceIdentity in unique constraint part. Constraint on /version would suffice to ensure that every single partition/device has at most one document per version.
Thanks for all the answers.
The problem was we have an old legacy system where “id “was an already heavily used property but it did not have unique values.
So whenever a document comes with a different version it was updating as “id” in cosmos has predefined meaning that is UPSERT of any arriving document is done on unique id value, in our case id is never unique.
Solution we found.
Whenever a document comes we process it in an azure function and swap “id” column with the value of unique “deviceidentity” value and save it, as structure of JSON cannot be changed as stated by our client and while reading these documents we have exposed an API which does the swapping again and sends the document to the requesting client as it is.

Use ConditionExpression to limit insert when ID doesn't exist in other table

Simple thing. While inserting data to table A I have a HashKey id and additional hash index for column ex_id, which is kind of a foreign key in table B.
When inserting a new data into table A I would like to create an exception whenever data is inserted with value in column ex_id that doesn't have a correspondent entry in table B.
I thought that ConditionExpression is the way to go, but can't make it work - probably missing something obvious. Tried to use contains()...
Any ideas?
As per my knowledge this would not be possible at DynamoDB end because there are no relationship between the tables.
What you can do is that you can have a condition at the application level, which checks on its own and throw an exception before inserting the value in table A. (You can query table B for that "Id" if found then insert else throw exception)
DynamoDB does not natively support any kind of foreign key support, everything works on a per table basis, per key basis. DynamoDB's approach is to handle such logic at the client level. For example see the dynamodb transactions client. This library allows you to perform transactions across tables which either all succeed or all rollback.
For your case, I would first make a getItem request to table B (use consistent read) if it exists then write to table A.
Then I would enable streams on table A and write a lambda function to check if any data violations get written to the table.

How to design DynamoDB table to facilitate searching by time ranges, and deleting by unique ID

I'm new to DynamoDB - I already have an application where the data gets inserted, but I'm getting stuck on extracting the data.
Requirement:
There must be a unique table per customer
Insert documents into the table (each doc has a unique ID and a timestamp)
Get X number of documents based on timestamp (ordered ascending)
Delete individual documents based on unique ID
So far I have created a table with composite key (S:id, N:timestamp). However when I come to query it, I realise that since my id is unique, because I can't do a wildcard search on ID I won't be able to extract a range of items...
So, how should I design my table to satisfy this scenario?
Edit: Here's what I'm thinking:
Primary index will be composite: (s:customer_id, n:timestamp) where customer ID will be the same within a table. This will enable me to extact data based on time range.
Secondary index will be hash (s: unique_doc_id) whereby I will be able to delete items using this index.
Does this sound like the correct solution? Thank you in advance.
You can satisfy the requirements like this:
Your primary key will be h:customer_id and r:unique_id. This makes sure all the elements in the table have different keys.
You will also have an attribute for timestamp and will have a Local Secondary Index on it.
You will use the LSI to do requirement 3 and batchWrite API call to do batch delete for requirement 4.
This solution doesn't require (1) - all the customers can stay in the same table (Heads up - There is a limit-before-contact-us of 256 tables per account)

Resources