ScalarDB on ComosDB Configuration - azure-cosmosdb

I'd like to estimate the costs of Azure CosmosDB on Scalar DB.
CosmosDB requires the following parameters. But these parameters are not ScalarDB parameters.
API
Number of Regions
Multi-region writes
Default consistency
Indexing policy
Total data stored in transactional store
Use Analyltical Store
Workload mode
Item Size
Number of properties
Point reads/sec
Creates/sec
Updates/sec
Deletes/sec
Queries/sec
Average RU charge per query
So, should we set which configuration and can we above parameters map to Scalar DB parameters?
I confirm the following questions.
About Configuration
API -> Cassandra
Number of Regions -> If we need to write into multi-region, set over 2.
Multi-region writes -> If the multi-regions is over 2, set to "Enabled"
Default consistency -> Strong. But CosmosDB could satisfy transactions across multi-partitions. It is covered by Scalar DB.
About API calls
Scalar DB APIs vs ComosDB APIs
Put: Insert without condition => createItem()
Put: Insert with condition => readItem() -> not exists -> createItem()
Put: Update with condition => readItem() -> if conditions are satisfied -> merge columns -> replaceItem()
Get -> readItem()
Delete without condition => deleteItem()
Delete with condition => readItem() -> if conditions are satisfied -> deleteItem()
Scan => container.queryItems()
I think ScalarDB has some operations before the above operations which check the transaction state. So, Does Scalar DB require more ComosDB API Calls?

Please use the following settings.
API: Core (SQL) (not Cassandra)
Number of Regions: 1 (since Strong
consistency doesn't support multi-region)
Multi-region writes:
Disabled (since Strong consistency doesn't support multi-region)
Default consistency: Strong
For what Scalar DB API calls which Cosmos DB API,
please check the code.

Related

Fastest way to fetch multiple documents from Azure Cosmos DB

I have a SQL API Cosmos DB collection with the id and partition key both equal to /id.
Given a list of IDs, I need to fetch all those documents. When using the .NET SDK (v3.25), which of the below Container class methods is recommended to get the lowest latency:
In parallel, use ReadItemAsync to read all documents.
Use ReadManyItemsAsync to read all the documents.
Use GetItemQueryIterator with a SQL query of the form SELECT * FROM c where c.id in ('id-1', 'id-2', ...).
If you want to retrieve large group of individual items, the most efficient way is to use ReadManyItemsAsync() rather than invoking ReadItemAsync() many times/Parallel.

Upsert Cosmos item TTL using Azure Data Factory Copy Activity

I have a requirement to upsert data from REST API to Cosmos DB and also maintain the item level TTL for particular time interval.
I have used ADF Copy activity to copy the data but for TTL, used additional custom column at source side with hardcoded value 30.
Noticed that time interval (seconds) updating as string instead of integer. Hence failing with the below error.
Details
Failure happened on 'Sink' side. ErrorCode=UserErrorDocumentDBWriteError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Documents failed to import due to invalid documents which violate some of Cosmos DB constraints: 1) Document size shouldn't exceeds 2MB; 2) Document's 'id' property must be string if any, and must not include the following charaters: '/', '', '?', '#'; 3) Document's 'ttl' property must not be non-digital type if any.,Source=Microsoft.DataTransfer.DocumentDbManagement,'
ttl Mapping between Custom column to cosmos DB
When i use ttl1 instead of ttl, it is getting success and value stored as string.
Any suggestion please?
Yes, that's the issue with additional columns in Copy activity. Even of you set it to int, it will change to string at the source.
The possible workaround is to create a Cosmos DB trigger in Azure function and add 'TTL' there.

Azure Cosmos DB: Update pipeline supported, or Cosmos-native way to emulate this?

We wanted to switch over from using MongoDB 4.2 to Cosmos DB, but realized that the thing preventing us to do so are Update (aggregation) Pipelines. MongoDB supports them, on Cosmos DB we get a weird looking error Expected type object but found array. which prompts us to believe they are not supported (as you provide an array of update stages as opposed to an update document).
Is there a way to achieve something similar with Cosmos DB methods?
Update pipelines in MongoDB allow you to update a document with multiple steps as one atomic operation. The pipeline currently looks kinda like this (stock-keeping system with keeping track of reservations):
Set a field to a value, and set another field to a calculated
value based on some input and some document fields
Set a boolean flag in case the calculation from step 1 yielded 0 or less
Set a DateTime flag to NOW in case calculation from step 2 triggered
"false"

Using the sort() cursor method without the default indexing policy in Azure Cosmos DB for MongoDB API

With Cosmos DB for MongoDB API (Version 3.4), the following find query in combination with the method cursor sort seems to behave incorrectly:
db.test.find({"field1": "value1"}).sort({"field2": 1})
The error occurs, if all of the following conditions are met:
the default indexing policy were discarded - regardless of whether custom indexes were created afterwards using createIndex().
The find() query does not return any documents (Find(filter).Count() == 0)
The Sort document defining the sort order contains only one field. It doesn't matter, whether this field exists or has been indexed. Using two fields in the sort document returns 0 hits which is the correct behavior.
The error also occurs, if all of the following conditions are met:
the default indexing policy were discarded
The find() query returns one or more documents
The Sort document contains exactly one field. This field has not been indexed.
The error message:
The index path corresponding to the specified order-by item is excluded.
The malfunction occurs only when using the CosmosDB, with native MongoDB (mongoDB Atlas, v4.0) it behaves correctly.
Azure Cosmos DB for MongoDB API with MongoDB 3.4 wire protocol (preview feature) is used. The problem occurs with both a MongoDB C#/.NET driver and the mongo shell.
In addition, the problem only occurs with find(). An equivalent aggregation pipeline containing $match and $sort behaves correctly.
Reproduction
Create an Azure Cosmos DB Account with the "Azure Cosmos DB for MongoDB API". Enable the preview feature MongoDB 3.4 (Version 3.2 has not been tested).
Create a new database
Create a new collection, define a shard key
Drop the default indexing policy (using db.test.dropIndexes() )
(Optional) Create new custom indexes
(Optional) Insert documents
Execute command in mongo shell (or the equivalent code with mongoDB C#/.NET driver):
db.test.find({"field1": "value1"}).sort({"field2": 1})
Expected result
All documents that match the query criteria. If there are none, no documents should be returned.
Actual result
Error: error: {
"_t" : "OKMongoResponse",
"ok" : 0,
"code" : 2,
"errmsg" : "Message: {\"Errors\":[\"The index path corresponding to the specified order-by item is excluded.\"]}\r\nActivityId: c50cc751-0000-0000-0000-000000000000, Request URI: /apps/[...]/, RequestStats: \r\nRequestStartTime: 2019-07-11T08:58:48.9880813Z, RequestEndTime: 2019-07-11T08:58:49.0081101Z, Number of regions attempted: 1\r\nResponseTime: 2019-07-11T08:58:49.0081101Z, StoreResult: StorePhysicalAddress: rntbd://[...]/, LSN: 359549, GlobalCommittedLsn: 359548, PartitionKeyRangeId: 0, IsValid: True, StatusCode: 400, SubStatusCode: 0, RequestCharge: 1, ItemLSN: -1, SessionToken: -1#359549, UsingLocalLSN: True, TransportException: null, ResourceType: Document, OperationType: Query\r\n, SDK: Microsoft.Azure.Documents.Common/2.4.0.0", [...]
Workaround
Adding an additional "dummy" field to the sort document prevents the error:
db.test.find({"field1": "value1"}).sort({"field2": 1, "dummyfield": 1}).count()
The workaround is not satisfactory. It could falsify the result.
Am I doing something wrong, or is Cosmos DB behaving flawed here?
According to Microsoft support, an index needs to be created on the field being sorted. The default indexes can be dropped and custom indexes created. As for the issue of not modifying the index every time a new field is added, there is no other alternative other than performing a client side sort. Unfortunately, client side sorting would take lot of CPU memory on the client side and the sort on index would take work when you would get more fields to index.
Thus I did not find a really satisfying solution:
Using the Default Indexing Policy. However, this can lead to a huge index.
Indexing all elements that need to be sorted. Every time a new element has to be indexed, this leads to a manual modification of the indexing policy.
Only use Client-side sort. In my opinion this leads to a strong limitation of MongoDB functionality.
Using aggregation frameworks instead of the find method. This leads to increased complexity and traffic.
Migrating to native MongoDB.
db.collection.createIndex ({ "$**" : 1 });

Google Datastore python returning less number of entities per page

I am using Python client SDK for Datastore (google-cloud-datastore) version 1.4.0. I am trying to run a key-only query fetch:
query = client.query(kind = 'SomeEntity')
query.keys_only()
Query filter has EQUAL condition on field1 and GREATER_THAN_OR_EQUAL condition on field2. Ordering is done based on field2
For fetch, I am specifying a limit:
query_iter = query.fetch(start_cursor=cursor, limit=100)
page = next(query_iter.pages)
keyList = [entity.key for entity in page]
nextCursor = query_iter.next_page_token
Though there are around 50 entities satisfying this query, each fetch returns around 10-15 results and a cursor. I can use the cursor to get all the results; but this results in additional call overhead
Is this behavior expected?
keys_only query is limited to 1000 entries in a single call. This operation counts as a single entity read.
For another limitations of Datastore, please refer detailed table in the documentation.
However, in the code, you did specify cursor as a starting point for a subsequent retrieval operation. Query can be limited, without cursor:
query = client.query()
query.keys_only()
tasks = list(query.fetch(limit=100))
For detailed instruction how to use limits and cursors, please refer documentation of the Google Gloud Datastore

Resources