Optimize ReplaceDocumentAsync with property check in CosmosDb - azure-cosmosdb

I am using DocumentDb and would like to replace a document only if the property of the document takes a certain value. Notice that all stored documents have this property (and the value can never be empty).
The only way I found is to do this in 3 steps:
1) Read the document with ReadDocumentAsync
2) Check if the resource response has the property value I expect
3) If step 2 returns true then do the replace with ReplaceDocumentAsync, otherwise do something else
I am concerned about the additional request charge and latency as this is 2 queries to the db. Is that the only way with the current .Net SDK or is there a more clever way?
Thank you

You could optimize this by using a Stored Procedure that executes directly in the database. The order of operations would be the same, you would include your document as part of the payload to the SPROC but there would be no round trips or extra latency.

Related

How does Cosmos DB Continuation Token work?

At first sight, it's clear what the continuation token does in Cosmos DB: attaching it to the next query gives you the next set of results. But what does "next set of results" mean exactly?
Does it mean:
the next set of results as if the original query had been executed completely without paging at the time of the very first query (skipping the appropriate number of documents)?
the next set of results as if the original query had been executed now (skipping the appropriate number of documents)?
Something completely different?
Answer 1. would seem preferable but unlikely given that the server would need to store unlimited amounts of state. But Answer 2. is also problematic as it may result in inconsistencies, e.g. the same document may be served multiple times across pages, if the underlying data has changed between the page queries.
Cosmos DB query executions are stateless at the server side. The continuation token is used to recreate the state of the index and track progress of the execution.
"Next set of results" means, the query is executed again on from a "bookmark" from the previous execution. This bookmark is provided by the continuation token.
Documents created during continuations
They may or may not be returned depending on the position of insert and query being executed.
Example:
SELECT * FROM c ORDER BY c.someValue ASC
Let us assume the bookmark had someValue = 10, the query engine resumes processing using a continuation token where someValue = 10.
If you were to insert a new document with someValue = 5 in between query executions, it will not show up in the next set of results.
If the new document is inserted in a "page" that is > the bookmark, it will show up in next set of results
Documents updated during continuations
Same logic as above applies to updates as well
(See #4)
Documents deleted during continuations
They will not show up in the next set of results.
Chances of duplicates
In case of the below query,
SELECT * FROM c ORDER BY c.remainingInventory ASC
If the remainingInventory was updated after the first set of results and it now satisfies the ORDER BY criteria for the second page, the document will show up again.
Cosmos DB doesn’t provide snapshot isolation across query pages.
However, as per the product team this is an incredibly uncommon scenario because queries over continuations are very quick and in most cases all query results are returned on the first page.
Based on preliminary experiments, the answer seems to be option #2, or more precisely:
Documents created after serving the first page are observable on subsequent pages
Documents updated after serving the first page are observable on subsequent pages
Documents deleted after serving the first page are omitted on subsequent pages
Documents are never served twice
The first statement above contradicts information from MSFT (cf. Kalyan's answer). It would be great to get a more qualified answer from the Cosmos DB Team specifying precisely the semantics of retrieving pages. This may not be very important for displaying data in the UI, but may be essential for data processing in the backend, given that there doesn't seem to be any way of disabling paging when performing a query (cf. Are transactional queries possible in Cosmos DB?).
Experimental method
I used Sacha Bruttin's Cosmos DB Explorer to query a collection with 5 documents, because this tool allows playing around with the page size and other request options.
The page size was set to 1, and Cross Partition Queries were enabled. Different queries were tried, e.g. SELECT * FROM c or SELECT * FROM c ORDER BY c.name.
After retrieving page 1, new documents were inserted, and some existing documents (including documents that should appear on subsequent pages) were updated and deleted. Then all subsequent pages were retrieved in sequence.
(A quick look at the source code of the tool confirmed that ResponseContinuationTokenLimitInKb is not set.)

How can I get a document at a specific index after orderBy

I have some code like this:
...
const snapshot = firestore().collection("orders").orderBy("deliveryDate")
...
I want to access only the 100th order in the returned documents. So far, the only way I achieve this is to do firestore().collection("orders").orderBy("deliveryDate").limit(100) and this returns first 100 documents and I can access the last order. But, I end up fetching 99 unwanted documents and this could become quite slower if I want the 200th document or higher.
So, I basically want to know if there's a possible way of getting just the index I want after sorting.
As far as I know, startAt() and startAfter() only accept a doc reference or field values, not an index/offset
Firestore does not offer any way to offset by some numeric amount to web and mobile clients (and doing so would end up having the exact same cost as what you're doing now).
If you need to impose some sort of offset into your collection, you will need to maintain that in the document itself for querying, or use some other type of storage that gives you fast cheap access by index.

cosmos db multiple access conditions

Is it possible in some way to have multiple access conditions that prevent the document to be saved to cosmos if they isn't met.
Today I have an accesscondition on the ETag, to prevent an old version of the document to be saved. But I want to have another condition based on the status of the document. So if the document in my store is in a 'closed' status, it will prevent any from modifying it.
I can always do a load -> check -> save routine, but the accesscondition works like a charm for the ETag so I wonder if there is a way to have multiple access condition specified when saving the document.
Best Regards
Magnus
Based on the detailed statements in below blogs,
1.https://codeopinion.com/documentdb-optimistic-concurrency/
2.https://chapsas.com/understanding-optimistic-concurrency-in-cosmos-db/
ETAG in cosmos db only provides optimistic concurrency and can be used with an AccessCondition in order to ensure that if the document changed between the retrieval and the manipulation attempt of the document.
AccessConditionType only has IfMatch or IfNoneMatch,no more other status.
So,back to your requirements,it seems that you have to add ifClosed item in your document and check it when you do modification to implement your needs.

How can I query for all new and updated documents since last query?

I need to query a collection and return all documents that are new or updated since the last query. The collection is partitioned by userId. I am looking for a value that I can use (or create and use) that would help facilitate this query. I considered using _ts:
SELECT * FROM collection WHERE userId=[some-user-id] AND _ts > [some-value]
The problem with _ts is that it is not granular enough and the query could miss updates made in the same second by another client.
In SQL Server I could accomplish this using an IDENTITY column in another table. Let's call the table version. In a transaction I would create a new row in the version table, do the updates to the other table (including updating the version column with the new value. To query for new and updated rows I would use a query like this:
SELECT * FROM table WHERE userId=[some-user-id] and version > [some-value]
How could I do something like this in Cosmos DB? The Change Feed seems like the right option, but without the ability to query the Change Feed, I'm not sure how I would go about this.
In case it matters, the (web/mobile) clients connect to data in Cosmos DB via a web api. I have control of the entire stack - from client to back-end.
As the statements in this link:
Today, you see all operations in the change feed. The functionality
where you can control change feed, for specific operations such as
updates only and not inserts is not yet available. You can add a “soft
marker” on the item for updates and filter based on that when
processing items in the change feed. Currently change feed doesn’t log
deletes. Similar to the previous example, you can add a soft marker on
the items that are being deleted, for example, you can add an
attribute in the item called "deleted" and set it to "true" and set a
TTL on the item, so that it can be automatically deleted. You can read
the change feed for historic items, for example, items that were added
five years ago. If the item is not deleted you can read the change
feed as far as the origin of your container.
Change feed is not available for your requirements.
My idea:
Use Azure Function Cosmos DB Trigger to collect all the operations in your specific cosmos collection. Follow this document to configure the input of azure function as cosmos db, then follow this document to configure the output as azure queue storage.
Get the ids of changed items and send them into queue storage as messages.When you want to query the changed item,just query the messages from the queue to consume them at a specific unit time and after that just clear the entire queue. No items will be missed.
With your approach, you can get added/updated documents and save reference value (_ts and id field) somewhere (like blob)
SELECT * FROM collection WHERE userId=[some-user-id] AND _ts > [some-value] and id !='guid' order by _ts desc
This is a similar approach we use to read data from Eventhub and store checkpointing information (epoch number, sequence number and offset value) in blob. And at a time only one function can take a lease of that blob.
If you go with ChangeFeed, you can create listener (Function or Job) to listen all add/update data from collection and you can store those value in some collection, while saving data you can add Identity/version field on every document. This approach may increase your cosmos DB bill.
This is what the transaction consistency levels are for: https://learn.microsoft.com/en-us/azure/cosmos-db/consistency-levels
Choose strong consistency and your queries will always return the latest write.
Strong: Strong consistency offers a linearizability guarantee. The
reads are guaranteed to return the most recent committed version of an
item. A client never sees an uncommitted or partial write. Users are
always guaranteed to read the latest committed write.

Modality work list - Which items are returned for C-FIND request of a sequence?

My question is a really basic question. Consider to query a modality work list to get some work items by a C-FIND query. Consider using a sequence (SQ) as Return Key attribute for the C-FIND query, for example: [0040,0100] (Scheduled Procedure Step) and universal matching.
What should I expect in the SCP's C-FIND response? Or, better say, what should I expect to find with regards of the scheduled procedure step for a specific work item? All the mandatory items that Modality Work List Information Model declare as encapsulated in the sequence? Should I instead explicitly issue a C-FIND request for those keys I want the SCP return in the response?
For example: if I want the SCP return the Scheduled Procedure Step Start Time and Scheduled Procedure Start Date, do I need to issue a specific C-FIND request with those keys or querying for Scheduled Procedure Step key is enough to force the SCP to send all items related to the Scheduled Procedure Step itself?
Yes, you should include the Scheduled Procedure Step Start Time / Date Tags into the 0040,0100 sequence.
See also Service Class Specifications (K6.1.2.2)
This will not ensure you will retrieve this information, because it depends on the Modality Worklist Provider, which information will be returned.
You could also request a Dicom Conformance Statement from the Modality Provider to know the necessary tags for request/retrieve.
As for table K.6-1, you can consider it as showing only the requirement of the SCP side or what SCP is required to use for matching key (i.e. query filter) and additional required attribute values to return (i.e. Return Key) with successful match. It is up to SCP’s implementation to support matching against required key but you can always expect SCP to use the values in matching key for query filter.
Also note that, SCP is only required to return values for attributes that are present in the C-FIND Request. One exception is the sequence matching and there you have the universal matching like mechanism where you can pass a zero length ITEM to retrieve entire sequence. So as stated in PS 3.4 section C.2.2.2.6, you can just include an empty ITEM (FFFE, E000) element with VR of SQ under Scheduled Procedure Step Sequence (0040, 0100) for universal matching.

Resources