Upsert Cosmos item TTL using Azure Data Factory Copy Activity

Upsert Cosmos item TTL using Azure Data Factory Copy Activity - azure-cosmosdb

I have a requirement to upsert data from REST API to Cosmos DB and also maintain the item level TTL for particular time interval.
I have used ADF Copy activity to copy the data but for TTL, used additional custom column at source side with hardcoded value 30.
Noticed that time interval (seconds) updating as string instead of integer. Hence failing with the below error.
Details
Failure happened on 'Sink' side. ErrorCode=UserErrorDocumentDBWriteError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Documents failed to import due to invalid documents which violate some of Cosmos DB constraints: 1) Document size shouldn't exceeds 2MB; 2) Document's 'id' property must be string if any, and must not include the following charaters: '/', '', '?', '#'; 3) Document's 'ttl' property must not be non-digital type if any.,Source=Microsoft.DataTransfer.DocumentDbManagement,'
ttl Mapping between Custom column to cosmos DB
When i use ttl1 instead of ttl, it is getting success and value stored as string.
Any suggestion please?

Yes, that's the issue with additional columns in Copy activity. Even of you set it to int, it will change to string at the source.
The possible workaround is to create a Cosmos DB trigger in Azure function and add 'TTL' there.

Related

How to specify CosmosDb Synapse Link types when parquet type is incorrect?

I have a CosmosDb and a Synapse workspace linked. Everything almost works using Synapse to create SQL views to the Cosmos data.
In Cosmos I have one data set with a property that is always a zero. I know it is actually a decimal because it is a price and future data is likely to contain decimal prices.
In Synapse I need to project this data into an SQL view where that column is correctly a decimal(19,4).
When I run an OpenRowSet query into the Cosmos data and attempt to specify the type for this property I get the following error.
select *
from OPENROWSET(
'CosmosDb',
'account=myaccount;database=myDatabase;region=theRegion;key=xxxxxxxxxxxxxxx',
[myCollection])
with (
[salesPrice] float '$.salesPrice')
as testQuery
I get the error:
Column 'salesPrice' of type 'FLOAT' is not compatible with external data type 'Parquet physical type: INT64', please try with 'BIGINT'.
Obviously a BIGINT here is going to fail as soon as I get a true decimal price.
I think the parquet type is getting set to BIGINT because in Cosmos all the values for this column are zero. I guess more generally it would be the same problem if the Cosmos property was all non-zero integers.
How can I force the type of salesPrice to be a decimal or float?
(I don't want to get side tracked here on float vs decimal for monetary values, I understand the difference; this error happens either way)
UPDATE
This problem manifests itself also in another way without specifying a schema with OPENROWSET.
In a new CosmosDb collection insert a document such as:
{
"myid" : 1,
"price" : 0
}
If I wait a minute or so I can query this document from Synapse with:
select *
from OPENROWSET(
'myCosmosDb',
'account=myAccount;database=myDatabase;region=myRegion;key=xxxxxxxxxxxxxxxxxxx',
[myCollection])
as testQuery;
and I get the expected results.
Now add a second document:
{
"myid" : 1,
"price" : 1.1
}
and re-run the query and I get the same error:
Column 'price' of type 'FLOAT' is not compatible with external data type 'Parquet physical type: INT64', please try with 'BIGINT'
Is there any way to work around or prevent these kinds of errors?

How about set the document like
{
"myid" : "1",
"price" : "1.1"
}

Azure Cosmos DB: Update pipeline supported, or Cosmos-native way to emulate this?

We wanted to switch over from using MongoDB 4.2 to Cosmos DB, but realized that the thing preventing us to do so are Update (aggregation) Pipelines. MongoDB supports them, on Cosmos DB we get a weird looking error Expected type object but found array. which prompts us to believe they are not supported (as you provide an array of update stages as opposed to an update document).
Is there a way to achieve something similar with Cosmos DB methods?
Update pipelines in MongoDB allow you to update a document with multiple steps as one atomic operation. The pipeline currently looks kinda like this (stock-keeping system with keeping track of reservations):
Set a field to a value, and set another field to a calculated
value based on some input and some document fields
Set a boolean flag in case the calculation from step 1 yielded 0 or less
Set a DateTime flag to NOW in case calculation from step 2 triggered
"false"

How to update values in cosmosdb as output using azure stream analutics?

at first event I get data like below
{
'product_name':'hamam',
'quantity':'100'
}
at second I get data like below
{
'product_name':'hamam',
'quantity':'70'
}
here I wanna update the values in cosmos db, how can I do it?

ASA supports upserts feature for cosmos db if your data contains a unique document id.(Your sample data seems does not have it) Please see this paragraph about upserts in ASA for cosmos db.
Some excerpt as below:
Stream Analytics integration with Azure Cosmos DB allows you to insert or update records in your container based on a given Document ID column.
If the incoming JSON document has an existing ID field, that field is automatically used as the Document ID column in Cosmos DB and any subsequent writes are handled as such, leading to one of these situations:
unique IDs lead to insert
duplicate IDs and 'Document ID' set to 'ID' leads to upsert
duplicate IDs and 'Document ID' not set leads to error, after the
first document

Using the sort() cursor method without the default indexing policy in Azure Cosmos DB for MongoDB API

With Cosmos DB for MongoDB API (Version 3.4), the following find query in combination with the method cursor sort seems to behave incorrectly:
db.test.find({"field1": "value1"}).sort({"field2": 1})
The error occurs, if all of the following conditions are met:
the default indexing policy were discarded - regardless of whether custom indexes were created afterwards using createIndex().
The find() query does not return any documents (Find(filter).Count() == 0)
The Sort document defining the sort order contains only one field. It doesn't matter, whether this field exists or has been indexed. Using two fields in the sort document returns 0 hits which is the correct behavior.
The error also occurs, if all of the following conditions are met:
the default indexing policy were discarded
The find() query returns one or more documents
The Sort document contains exactly one field. This field has not been indexed.
The error message:
The index path corresponding to the specified order-by item is excluded.
The malfunction occurs only when using the CosmosDB, with native MongoDB (mongoDB Atlas, v4.0) it behaves correctly.
Azure Cosmos DB for MongoDB API with MongoDB 3.4 wire protocol (preview feature) is used. The problem occurs with both a MongoDB C#/.NET driver and the mongo shell.
In addition, the problem only occurs with find(). An equivalent aggregation pipeline containing $match and $sort behaves correctly.
Reproduction
Create an Azure Cosmos DB Account with the "Azure Cosmos DB for MongoDB API". Enable the preview feature MongoDB 3.4 (Version 3.2 has not been tested).
Create a new database
Create a new collection, define a shard key
Drop the default indexing policy (using db.test.dropIndexes() )
(Optional) Create new custom indexes
(Optional) Insert documents
Execute command in mongo shell (or the equivalent code with mongoDB C#/.NET driver):
db.test.find({"field1": "value1"}).sort({"field2": 1})
Expected result
All documents that match the query criteria. If there are none, no documents should be returned.
Actual result
Error: error: {
"_t" : "OKMongoResponse",
"ok" : 0,
"code" : 2,
"errmsg" : "Message: {\"Errors\":[\"The index path corresponding to the specified order-by item is excluded.\"]}\r\nActivityId: c50cc751-0000-0000-0000-000000000000, Request URI: /apps/[...]/, RequestStats: \r\nRequestStartTime: 2019-07-11T08:58:48.9880813Z, RequestEndTime: 2019-07-11T08:58:49.0081101Z, Number of regions attempted: 1\r\nResponseTime: 2019-07-11T08:58:49.0081101Z, StoreResult: StorePhysicalAddress: rntbd://[...]/, LSN: 359549, GlobalCommittedLsn: 359548, PartitionKeyRangeId: 0, IsValid: True, StatusCode: 400, SubStatusCode: 0, RequestCharge: 1, ItemLSN: -1, SessionToken: -1#359549, UsingLocalLSN: True, TransportException: null, ResourceType: Document, OperationType: Query\r\n, SDK: Microsoft.Azure.Documents.Common/2.4.0.0", [...]
Workaround
Adding an additional "dummy" field to the sort document prevents the error:
db.test.find({"field1": "value1"}).sort({"field2": 1, "dummyfield": 1}).count()
The workaround is not satisfactory. It could falsify the result.
Am I doing something wrong, or is Cosmos DB behaving flawed here?

According to Microsoft support, an index needs to be created on the field being sorted. The default indexes can be dropped and custom indexes created. As for the issue of not modifying the index every time a new field is added, there is no other alternative other than performing a client side sort. Unfortunately, client side sorting would take lot of CPU memory on the client side and the sort on index would take work when you would get more fields to index.
Thus I did not find a really satisfying solution:
Using the Default Indexing Policy. However, this can lead to a huge index.
Indexing all elements that need to be sorted. Every time a new element has to be indexed, this leads to a manual modification of the indexing policy.
Only use Client-side sort. In my opinion this leads to a strong limitation of MongoDB functionality.
Using aggregation frameworks instead of the find method. This leads to increased complexity and traffic.
Migrating to native MongoDB.

db.collection.createIndex ({ "$**" : 1 });

Handling DocumentClientException with BulkImport

I am using Microsoft.Azure.CosmosDB.BulkExecutor.IBulkExecutor.BulkImportAsync to insert documents as a batch. I have implemented unique constraints for my cosmos db collection. If any of the input documents violates the constraint the entire bulk import operation fails with throwing DocumentClientException. Is this an expected behaviour? Or is there a way we can handle the exceptions for failed documents and make sure the valid documents are inserted?

First of all Thanks to Microsoft Document which has explained solid scenarios on the issue,
https://learn.microsoft.com/en-us/azure/data-factory/connector-troubleshoot-guide
This error appears when we define unique_key in addition to default id field defined by cosmos. The reason could be possible duplication of row for Unique Key in the dataset. Another possible reason, the Delta dataset which we are about to load has some of the unique keys which are already present in existing cosmos dataset.
For regular batch jobs there could be some updates happening on existing unique key itself, but we cannot update an existing unique key through batch process. As each record gets into cosmos as new record with new 'id' field value. Cosmos updates an existing record only same id field not on unique key.
Workaround, Since unique key is already going to be unique for every row across entire collection, we can define our unique value itself as also 'id' field. So now if we have any updates on addition field apart from unique key we can update them as 'id' field for respective unique key will also be same.
In SQL way,
SELECT <unique_key_field> AS id, <unique_key_field>, field1, field2 FROM <table_name>

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Upsert Cosmos item TTL using Azure Data Factory Copy Activity - azure-cosmosdb

Yes, that's the issue with additional columns in Copy activity. Even of you set it to int, it will change to string at the source. The possible workaround is to create a Cosmos DB trigger in Azure function and add 'TTL' there.

Related

How to specify CosmosDb Synapse Link types when parquet type is incorrect?

Azure Cosmos DB: Update pipeline supported, or Cosmos-native way to emulate this?

How to update values in cosmosdb as output using azure stream analutics?

Using the sort() cursor method without the default indexing policy in Azure Cosmos DB for MongoDB API

Handling DocumentClientException with BulkImport

Categories

Resources