Firestore transaction contention but with different documents - firebase

For example, I have 10 documents in my collection. 10 requests come in near the same second and will run the same query. They each will start their own transaction that will try and read 1 document, and then delete that document. Given the firestore documentation with document contention, it is made to seem like contention errors happen when more than one transaction occurs on the same document X amount of times (it is not documented how many times).
Cloud Firestore resolves data contention by delaying or failing one of the operations.
https://cloud.google.com/firestore/docs/transaction-data-contention
However, in this case since 1 of those transactions committed, I am assuming the other 9 that tried to operate on that same document, will retry because the document from the query was "changed" and couldn't commit. Then the next 9 transactions will try to do the same thing, but on another document, and this will continue until all requests finished deleting 1 document and there are no more active transactions.
Would the retry rules of these transactions that kept getting retried, be ABORTED due to contention, even though it's been a different document each time? Or would these transactions just keep getting delayed and retried because the contention is happening on different documents on each attempt?

According to the documentation, the transaction will retry a "finite number of times". This number is dependent on how the SDK itself is configured, which may be different for various SDK platforms and versions. It doesn't matter which contended document(s) caused the retry. The max number of retries is absolute for that transaction in order to avoid excessive work.
Newer versions of the SDK allow configuration of the number of retries (e.g. Android 24.4.0 lets you specify TransactionOptions).

Related

Managing Firestore document read count in MV3 browser extension

In the MV2 extension, I attach a Firestore onSnapshot listener in the persistent background page. As I understand it: 1. Firestore downloads all documents on first attaching of the listener, and afterwards, 2. only downloads the changed documents when they change. Since this listener persists over several hours, the total number of Firestore read counts (and hence the cost) is low.
But in the MV3 extension, the service worker (which houses the Firestore listener) is destroyed after five minutes. Therefore, the onSnapshot listener will be destroyed and re-attached several times in just a few hours. On every re-attachment, that listener would potentially re-download all of the user data. So, if the listener gets destroyed and attached five times, we incur five times as many document read counts (and hence the cost) in an MV3 extension as compared to an MV2 extension.
I'd like to understand:
Does using IndexedDB persistence help significantly reduce the document read counts? Even the when the service worker is restarted.
In case we are not using IDB persistence, how is the billing done? For example, the billing docs state that:
Also, if the listener is disconnected for more than 30 minutes (for example, if the user goes offline), you will be charged for reads as if you had issued a brand-new query.
If I re-attach the listener within 15-20 minutes, does it again incur document reads on all user data?
I tried writing a small example myself, and monitor the results on Cloud Console Monitoring to measure "Firestore Instance - Document Reads". However, I was not able to get clear results from it.
Note: For the purpose of discussion, I will avoid the workaround for making service-workers persistent, and focus on a worst case assuming that this workaround does not work.
Does using IndexedDB persistence help significantly reduce the document read counts? Even the when the service worker is restarted.
Yes. Upon a reconnect within the 30m interval, most documents can be read from the disk cache and won't have to be read from/on the server.
In case we are not using IDB persistence, how is the billing done? ... If I re-attach the listener within 15-20 minutes, does it again incur document reads on all user data?
If there is no data in the disk cache and no existing listener on the data, the documents will have to read from the server, and thus you will be charged for each document read there.

Firestore write stream exhausted when exceeding 15 writes per second

We recently started getting write stream exhausted errors:
#firebase/firestore: Firestore (8.10.0): FirebaseError: [code=resource-exhausted]: 8 RESOURCE_EXHAUSTED: Write stream exhausted maximum allowed queued writes.
#firebase/firestore: Firestore (8.10.0): Using maximum backoff delay to prevent overloading the backend.
We're sending a few million transactions in batches of 500 transactions to a firestore collection every 15 seconds. If we run it any more often than every 15 seconds we get the error above. If we exceed it too far, it eventually hard crashes. The advice we got from firebase support was to instead write to a bunch of subcollections, but their limits clearly show 500 writes/second even when writing sequentially to a single collection. Our document ids aren't quite sequential but are often close like client-id:guid where client id would often be the same for most of the write. Ideas on what we might be doing wrong/how to fix it? We've tried sending smaller more frequent batches and larger less frequent batches. We created a new project and didn't create any indexes to see if they were a problem. We've sent the requests from different clients, none of which are taxed for resources.
Cloud Firestore does not stop you from exceeding the tresholds below, but doing so affects perfomance, thus the "batches of 500" comes from this hard limit and this soft limit.
The main issue here is hotspotting, since the document IDs are client-id:guid where client id would often be the same for most of the write. I advised to follow best-practices, or your requests will get queued up, and eventually return RESOURCE_EXHAUSTED.
Our issue turned out to the be the batch sizes. We started sending batches of 50 instead of 500 and were able to write about 400/s. We still see the backoff errors but we're not losing any data. I haven't had time to test out the parallel individual writes to see if we can gain a little more write speed but I'll update this answer once I do.
Also, we tried to increase our batch write speeds with the 500/50/5 method they recommend and it still hit the resource exhausted errors.

Cloud Datastore transaction terminated without explicit rollback defined

From following document: https://cloud.google.com/datastore/docs/concepts/transactions
What would happen if transaction fails with no explicit rollback defined? For example, if we're performing put() operation on value arguments.
The document states that transaction should be idempotent, what does this mean with respect to put() operation? It is not clear how idempotency is applied in this context.
How do we detect failure if failure from commit is not reliable according to the documentation?
We are seeing some symptoms where put() against value argument is sometimes partially saving the data. Note we do not have explicit rollback defined.
As you may already know, Datastore transactions are guaranteed to be atomic, which means that it applies the all-or-nothing principle; either all operations succeed or they all fail. This ensures that the data in your database remains consistent over time.
Now, regardless whether you execute put or any other operation in your transaction, your implementation of the code should always ensure that your transaction has either successfully commited or rolled back. This means that if you aren't fully sure whether the commit succeeded, you should explicitly issue a rollback.
However, there may be some exceptions where a commit might fail, and this doesn't necessarily mean that no data was written to your database. The documentation even points out that "you can receive errors in cases where transactions have been committed."
The simple way to detect transaction failures would be to add a try/catch block in your code for when an Exception (failed transactional operation) or DatastoreException (errors related to Datastore - failed commit) are thrown. I believe that you may already have an answer in this Stackoverflow post about this particular question.
A good practice is to make your transactions idempotent whenever possible. In other words, if you're executing a transaction that includes a write operation put() to your database, if this operation were to fail and needed to be retried, the end result should ideally remain the same.
A real world example can be - you're trying to transfer some money to your friend; the transaction consists of withdrawing 20 USD from your bank account and depositing this same amount into your friend's bank account. If the transaction were to fail and had to be retried, the transaction should still operate with the same amount of money (20 USD) as the final result.
Keep in mind that the Datastore API doesn't retry transactions by default, but you can add your own retry logic to your code, as per the documentation.
In summary, if a transaction is interrupted and your logic doesn't handle the failure accordingly, you may eventually see inconsistencies in the data of your database.

Using a Mutex with Firebase to lock a document while avoiding many retry attempts when scaling up

I have a large one million document collection with Firebase that I treat as a stack array where the first element gets read and removed from the stack. My main problem is I have over a thousand connections trying to access the collection and I am having issues with connections receiving the same document. To prevent duplicates results, I've resorted to using Mutex as referenced by this post below..
Cloud Firestore document locking
I am using a Mutex to lock each document before removing it from the collection. I use transactions to ensure the mutex owner is not getting overwritten by other connections or to check if the document has not been removed yet.
The problem I have with this solution is as we scale up, more connections are fighting over retrieving a mutex lock. Each connection spends a long time retrying until it successfully locks a document. Avoiding long retries will allow for faster response time and less reads.
So in summary, a connection tries to retrieve a document. It retrieves the document but fails to successfully create a lock because another incoming connection just locked it. So it looks for another document and also fails. It keeps retrying until it beats another connnection to locking the document.
Is it possible to increase throughput and keep read costs low as I scale up?
Yeah, I doubt such a mutex is going to help your throughput.
How important is it that documents are processed in the exact order that they are in the queue? If it is not crucial, you could consider having each client request the first N documents, and then picking one at random to lock-and-process. That would improve your throughput up to N times.

Server Side Locks in Cloud Firestore

I'm curious about the behavior of the locks that are performed when doing server side transactions on Cloud Firestore as mentioned in this video: https://www.youtube.com/watch?time_continue=750&v=dOVSr0OsAoU
My transaction will be reading multiple documents and placing locks on them. My question is do these locks restrict all access to the documents - including concurrent reads from client code that isn't part of a transaction? Or do they only restrict writes?
If they do restrict reads is there any way around this - it could lead to severe slowdown in the app I'm working on.
Also in the case that a transaction tries to lock documents that are already locked - what is the retry pattern - how often does it retry, and is there an exponential backoff?
Thanks!
My transaction will be reading multiple documents and placing locks on them.
A transaction operation is first reading the value of a property within a document in order to perform the write operation. So it requires round trip communications with server in order to ensure that the code inside the transaction completes successfully.
My question is do these locks restrict all access to the documents - including concurrent reads from client code that isn't part of a transaction?
The answer is no, concurrent users can read the content of the document even if you perform a write operation using a transaction.
Also in the case that a transaction tries to lock documents that are already locked - what is the retry pattern - how often does it retry, and is there an exponential backoff?
According to the official documentation regarding Firestore transactions, a transaction can fail only the following cases:
The transaction contains read operations after write operations. Read operations must always come before any write operations.
The transaction read a document that was modified outside of the transaction. In this case, the transaction automatically runs again. The transaction is retried a finite number of times.
The transaction exceeded the maximum request size of 10 MiB.
Transaction size depends on the sizes of documents and index entries modified by the transaction. For a delete operation, this includes the size of the target document and the sizes of the index entries deleted in response to the operation.
A failed transaction returns an error and does not write anything to the database. You do not need to roll back the transaction; Cloud Firestore does this automatically.

Resources