Are Document DB Triggers executed in parallel? - azure-cosmosdb

We are trying to use document DB trigger to generate autonumbers. For this purpose we have a special document in our collection to store the auto number and then every other document in the collection is created by calling this trigger. The trigger behaves in the following manner -
1) Reads the last used number from the autonumber document.
2) Increments the number by 1 and then saves the incremented value back to the autonumber document
3) Creates a new document with an autoId field set to the incremented value and rest of the field of the new document are as passed into the body
await documentClient.CreateDocumentAsync("collectionURI", newDocument, new RequestOptions() { PreTriggerInclude = new List<string> {"autoNumbersTrigger"} });
We tested this while running the document DB client locally on our machines and even with 100K parallel inserts, our trigger never ran into a concurrency problem. Hence the question, is this behavior guaranteed? Is it safe to say that the described triggered behavior will never run into concurrency issues?

You should catch (int)DocumentClientException.StatusCode == 449 (retry with), which can be returned during a concurrent updates to the same document. As you've noticed, this is rare even at high write rates.

Related

Firebase cloud functions incrementing queue number

I'm working on a Flutter Restaurant application where each restaurant has a cloud firestore document and it in a field called queueNumber this value starts at 1 and with every order it increases by 1.
I'm trying to make sure each order has a unique queue number. I have a cloud function that triggers whenever a new document created in the orders collection. Here is the following code.
.onCreate(async (snapshot, context) => {
const orderData = snapshot.data();
const id = orderData.id;
if (orderData && orderData.restaurantId != null) {
return restDoc.update({
queueNumber: admin.firestore.FieldValue.increment(1)
})
}
});
So the user places an order with the existing queueNumber in the restaurant document. Than the cloud function increments the queueNumber so the next request has a queueNumber that is 1 higher than the previous.
Here is the problem: Sometimes when two orders are placed one after another they get the same queueNumber. The end result in restaurant document is correct but the individual orders get the wrong number (ex: Order 1 has 51 Order 2 has 51 Restaurant document has 53)
Is there a way to fix this method or a better approach to handle the queue numbers
Thanks.
You're running into a race condition between each of the clients that's adding a document. Firestore doesn't offer a built-in way to ensure that a field is unique, nor does it offer a way to automatically and safely set a value of a field based on the contents of other documents. This wouldn't scale in the way that Firestore requires.
You should first find a way to implment your app without increasing numbers like this. Check if maybe a timestamp is a better way to track the time order in which documents are added. That will scale much better.
If you absolutely need increasing numbers like this, you will have to involve a whole new document just to track the latest number assigned, and use that document in a transaction when adding new documents. The transaction will have to:
Read the counter document
Increment the count value in memory
Create the new document with this value
Also update the counter document with this value
All of this must be done within the transaction, or will not be safe.

How does Cosmos DB Continuation Token work?

At first sight, it's clear what the continuation token does in Cosmos DB: attaching it to the next query gives you the next set of results. But what does "next set of results" mean exactly?
Does it mean:
the next set of results as if the original query had been executed completely without paging at the time of the very first query (skipping the appropriate number of documents)?
the next set of results as if the original query had been executed now (skipping the appropriate number of documents)?
Something completely different?
Answer 1. would seem preferable but unlikely given that the server would need to store unlimited amounts of state. But Answer 2. is also problematic as it may result in inconsistencies, e.g. the same document may be served multiple times across pages, if the underlying data has changed between the page queries.
Cosmos DB query executions are stateless at the server side. The continuation token is used to recreate the state of the index and track progress of the execution.
"Next set of results" means, the query is executed again on from a "bookmark" from the previous execution. This bookmark is provided by the continuation token.
Documents created during continuations
They may or may not be returned depending on the position of insert and query being executed.
Example:
SELECT * FROM c ORDER BY c.someValue ASC
Let us assume the bookmark had someValue = 10, the query engine resumes processing using a continuation token where someValue = 10.
If you were to insert a new document with someValue = 5 in between query executions, it will not show up in the next set of results.
If the new document is inserted in a "page" that is > the bookmark, it will show up in next set of results
Documents updated during continuations
Same logic as above applies to updates as well
(See #4)
Documents deleted during continuations
They will not show up in the next set of results.
Chances of duplicates
In case of the below query,
SELECT * FROM c ORDER BY c.remainingInventory ASC
If the remainingInventory was updated after the first set of results and it now satisfies the ORDER BY criteria for the second page, the document will show up again.
Cosmos DB doesn’t provide snapshot isolation across query pages.
However, as per the product team this is an incredibly uncommon scenario because queries over continuations are very quick and in most cases all query results are returned on the first page.
Based on preliminary experiments, the answer seems to be option #2, or more precisely:
Documents created after serving the first page are observable on subsequent pages
Documents updated after serving the first page are observable on subsequent pages
Documents deleted after serving the first page are omitted on subsequent pages
Documents are never served twice
The first statement above contradicts information from MSFT (cf. Kalyan's answer). It would be great to get a more qualified answer from the Cosmos DB Team specifying precisely the semantics of retrieving pages. This may not be very important for displaying data in the UI, but may be essential for data processing in the backend, given that there doesn't seem to be any way of disabling paging when performing a query (cf. Are transactional queries possible in Cosmos DB?).
Experimental method
I used Sacha Bruttin's Cosmos DB Explorer to query a collection with 5 documents, because this tool allows playing around with the page size and other request options.
The page size was set to 1, and Cross Partition Queries were enabled. Different queries were tried, e.g. SELECT * FROM c or SELECT * FROM c ORDER BY c.name.
After retrieving page 1, new documents were inserted, and some existing documents (including documents that should appear on subsequent pages) were updated and deleted. Then all subsequent pages were retrieved in sequence.
(A quick look at the source code of the tool confirmed that ResponseContinuationTokenLimitInKb is not set.)

Optimizing the number of reads from firestore server using caching or snapshot listener

I am rendering the following view using Firebase. So basically the search is powered by a Firebase query.
I am using the following code:
Query query = FirebaseUtils.buildQuery(
fireStore, 'customers', filters, lastDocument, documentLimit);
print("query =" + query.toString());
QuerySnapshot querySnapshot = await query.getDocuments();
print("Got reply from firestore. No of items =" + querySnapshot.documents.length.toString());
Questions:
If the user hits the same query, again and again, it still hits the server. I checked this by using doc.metadata.isFromCache and it always returns false.
Will using query snapshots help in reduce no of reads for this search query? I guess no. As the user is changing the query again and again.
Any other way to limit the number of reads?
If the user hits the same query, again and again, it still hits the server. I checked this by using doc.metadata.isFromCache and it always returns false.
If you are online, it will always return false and that's the expected behavior since the listener is always looking for changes on the server. If you want to force the retrieval of the data from the cache while you are online, then you should explicitly specify this to Firestore by adding Source.CACHE to your get() call. If you're offline, it will always return true.
Will using query snapshots help in reduce no of reads for this search query? I guess no. As the user is changing the query again and again.
No, it won't. What does a query snapshot represent? It's basically an object that contains the results of your query. However, if you perform a query, "again and again", as long as it's the same query and nothing has changed on the server, then you will not be charged with any read operations. This is happening because the second time you perform the query, the results are coming from the cache. If you perform each time a new search, you'll always be billed with a number of read operations that are equal with the number of elements that are returned by your query. Furthermore, if you create new searches and the elements that are returned are already in your cache, then you'll be billed with a read operation only for the new ones.
Any other way to limit the number of reads?
The simplest method to limit the results of a query is to use a limit() call and pass as an argument the number of elements you want your query to return:
limit(10)

Firestore, fetch only those documents from a collection which are not present in client cache

I am implementing a one-to-one chat app using firestore in which there is a collection named chat such that each document of a collection is a different thread.
When the user opens the app, the screen should display all threads/conversations of that user including those which have new messages (just like in whatsapp). Obviously one method is to fetch all documents from the chat collection which are associated with this user.
However it seems a very costly operation, as the user might have only few updated threads (threads with new messages), but I have to fetch all the threads.
Is there an optimized and less costly method of doing the same where only those threads are fetched which have new messages or more precisely threads which are not present in the user's device cache (either newly created or modified threads).
Each document in the chat collection have these fields:
senderID: (id of the user who have initiated the thread/conversation)
receiverID: (id of the other user in the conversation)
messages: [],
lastMsgTime: (timestamp of last message in this thread)
Currently to load all threads of a certain user, I am applying the following query:
const userID = firebase.auth().currentUser.uid
firebase.firestore().collection('chat').where('senderId', '==', userID)
firebase.firestore().collection('chat').where('receiverId', '==', userID)
and finally I am merging the docs returned by these two queries in an array to render in a flatlist.
In order to know whether a specific thread/document has been updated, the server will have to read that document, which is the charged operation that you're trying to avoid.
The only common way around this is to have the client track when it was last online, and then do a query for documents that were modified since that time. But if you want to show both existing and new documents, this would have to be a separate query, which means that it'd end up in a separate area of the cache. So in that case you'll have to set up your own offline storage on top of Firestore's, which is more work than I'm typically willing to do.

How can you create a transaction/batch write between multiple Firestore instances?

Firebase allows having multiple projects in a single application.
// Initialize another app with a different config
var secondary = firebase.initializeApp(secondaryAppConfig, "secondary");
// Retrieve the database.
var secondaryDatabase = secondary.database();
Example:
Project 1 has my users collection; Project 2 has my friends collection (suppose there's a reason for that). When I add a new friend in the Project 2 database, I want to increment the friendsCount in the user document in Project 1. For this reason, I want to create a transaction/batch write to insure consistency in the data.
How can I achieve this? Can I create a transaction or a batch write between different Firestore instances?
No, you cannot use the database transaction feature across multiple databases.
If absolutely required, I'd probably instead create a custom locking feature. From wiki,
To allow several users to edit a database table at the same time and also prevent inconsistencies created by unrestricted access, a single record can be locked when retrieved for editing or updating. Anyone attempting to retrieve the same record for editing is denied write access because of the lock (although, depending on the implementation, they may be able to view the record without editing it). Once the record is saved or edits are canceled, the lock is released. Records can never be saved so as to overwrite other changes, preserving data integrity.
In database management theory, locking is used to implement isolation among multiple database users. This is the "I" in the acronym ACID.
Source: https://en.wikipedia.org/wiki/Record_locking
It's been three years since the question, I know, but since I needed the same thing I found a working solution to perform the double (or even ^n) transaction. You have to nest the transactions like this.
db1.runTransaction(t1 => db2.runTransaction(t2 => async () => {
await t1.set(.....
await t2.update(.....
etc....
})).then(...).catch(...)
Since the error is propagated in the nested promises it is safe to execute the double transaction in this way because for a failure in any one of the databases it results in the error in all of them.

Resources