Server Side Locks in Cloud Firestore - firebase

I'm curious about the behavior of the locks that are performed when doing server side transactions on Cloud Firestore as mentioned in this video: https://www.youtube.com/watch?time_continue=750&v=dOVSr0OsAoU
My transaction will be reading multiple documents and placing locks on them. My question is do these locks restrict all access to the documents - including concurrent reads from client code that isn't part of a transaction? Or do they only restrict writes?
If they do restrict reads is there any way around this - it could lead to severe slowdown in the app I'm working on.
Also in the case that a transaction tries to lock documents that are already locked - what is the retry pattern - how often does it retry, and is there an exponential backoff?
Thanks!

My transaction will be reading multiple documents and placing locks on them.
A transaction operation is first reading the value of a property within a document in order to perform the write operation. So it requires round trip communications with server in order to ensure that the code inside the transaction completes successfully.
My question is do these locks restrict all access to the documents - including concurrent reads from client code that isn't part of a transaction?
The answer is no, concurrent users can read the content of the document even if you perform a write operation using a transaction.
Also in the case that a transaction tries to lock documents that are already locked - what is the retry pattern - how often does it retry, and is there an exponential backoff?
According to the official documentation regarding Firestore transactions, a transaction can fail only the following cases:
The transaction contains read operations after write operations. Read operations must always come before any write operations.
The transaction read a document that was modified outside of the transaction. In this case, the transaction automatically runs again. The transaction is retried a finite number of times.
The transaction exceeded the maximum request size of 10 MiB.
Transaction size depends on the sizes of documents and index entries modified by the transaction. For a delete operation, this includes the size of the target document and the sizes of the index entries deleted in response to the operation.
A failed transaction returns an error and does not write anything to the database. You do not need to roll back the transaction; Cloud Firestore does this automatically.

Related

Firestore transaction contention but with different documents

For example, I have 10 documents in my collection. 10 requests come in near the same second and will run the same query. They each will start their own transaction that will try and read 1 document, and then delete that document. Given the firestore documentation with document contention, it is made to seem like contention errors happen when more than one transaction occurs on the same document X amount of times (it is not documented how many times).
Cloud Firestore resolves data contention by delaying or failing one of the operations.
https://cloud.google.com/firestore/docs/transaction-data-contention
However, in this case since 1 of those transactions committed, I am assuming the other 9 that tried to operate on that same document, will retry because the document from the query was "changed" and couldn't commit. Then the next 9 transactions will try to do the same thing, but on another document, and this will continue until all requests finished deleting 1 document and there are no more active transactions.
Would the retry rules of these transactions that kept getting retried, be ABORTED due to contention, even though it's been a different document each time? Or would these transactions just keep getting delayed and retried because the contention is happening on different documents on each attempt?
According to the documentation, the transaction will retry a "finite number of times". This number is dependent on how the SDK itself is configured, which may be different for various SDK platforms and versions. It doesn't matter which contended document(s) caused the retry. The max number of retries is absolute for that transaction in order to avoid excessive work.
Newer versions of the SDK allow configuration of the number of retries (e.g. Android 24.4.0 lets you specify TransactionOptions).

Cloud Datastore transaction terminated without explicit rollback defined

From following document: https://cloud.google.com/datastore/docs/concepts/transactions
What would happen if transaction fails with no explicit rollback defined? For example, if we're performing put() operation on value arguments.
The document states that transaction should be idempotent, what does this mean with respect to put() operation? It is not clear how idempotency is applied in this context.
How do we detect failure if failure from commit is not reliable according to the documentation?
We are seeing some symptoms where put() against value argument is sometimes partially saving the data. Note we do not have explicit rollback defined.
As you may already know, Datastore transactions are guaranteed to be atomic, which means that it applies the all-or-nothing principle; either all operations succeed or they all fail. This ensures that the data in your database remains consistent over time.
Now, regardless whether you execute put or any other operation in your transaction, your implementation of the code should always ensure that your transaction has either successfully commited or rolled back. This means that if you aren't fully sure whether the commit succeeded, you should explicitly issue a rollback.
However, there may be some exceptions where a commit might fail, and this doesn't necessarily mean that no data was written to your database. The documentation even points out that "you can receive errors in cases where transactions have been committed."
The simple way to detect transaction failures would be to add a try/catch block in your code for when an Exception (failed transactional operation) or DatastoreException (errors related to Datastore - failed commit) are thrown. I believe that you may already have an answer in this Stackoverflow post about this particular question.
A good practice is to make your transactions idempotent whenever possible. In other words, if you're executing a transaction that includes a write operation put() to your database, if this operation were to fail and needed to be retried, the end result should ideally remain the same.
A real world example can be - you're trying to transfer some money to your friend; the transaction consists of withdrawing 20 USD from your bank account and depositing this same amount into your friend's bank account. If the transaction were to fail and had to be retried, the transaction should still operate with the same amount of money (20 USD) as the final result.
Keep in mind that the Datastore API doesn't retry transactions by default, but you can add your own retry logic to your code, as per the documentation.
In summary, if a transaction is interrupted and your logic doesn't handle the failure accordingly, you may eventually see inconsistencies in the data of your database.

How to do batched writes as part of a transaction

I found myself in a situation where I want to perform some operations on the database that should be handled in a single transaction. One of those operations is injecting > 500 documents, so this is throwing an error because it's hitting
maximum 500 writes allowed per request
In order to work around that, you could use batched writes, but I can't figure out how to do batched writes as part of a transaction. It seems like transaction.commit() is not a thing and in the docs transactions and batched writes appear to be two separate concepts.
Generally speaking, we are using transactions to have consistent data. The recommendation that you get:
you could use batched writes
It is for the exact same reason. Unfortunately, you cannot mix them. You have to choose one or the other. Realistic speaking, both the batch and the transaction are used for atomic updates.
A transaction is similar to batch and as the docs states:
All of the operations succeed, or none of them are applied.
The main difference between a batch write and a transaction is that a batch just writes, while a transaction reads and right after then writes.
So the solution in your case is to use Firestore batched-writes to perform 500 operation at a time.
As you have most probably read in the doc:
The Transaction object passed to a transaction's updateFunction
provides the methods to read and write data within the transaction
context.
and this object, in the Client SDKs, has only four methods: get(), set(), update() and delete() which all take a single Firestore Document as parameter.
With the Node.js Server SDK for Google Cloud Firestore, you will note that there is an additional method, getAll(), which "retrieves multiple documents from Firestore. Holds a pessimistic lock on all returned documents".
So, at the time of writing, there is no possibility, to "mix" a Transaction and a Batched Write.

Using a Mutex with Firebase to lock a document while avoiding many retry attempts when scaling up

I have a large one million document collection with Firebase that I treat as a stack array where the first element gets read and removed from the stack. My main problem is I have over a thousand connections trying to access the collection and I am having issues with connections receiving the same document. To prevent duplicates results, I've resorted to using Mutex as referenced by this post below..
Cloud Firestore document locking
I am using a Mutex to lock each document before removing it from the collection. I use transactions to ensure the mutex owner is not getting overwritten by other connections or to check if the document has not been removed yet.
The problem I have with this solution is as we scale up, more connections are fighting over retrieving a mutex lock. Each connection spends a long time retrying until it successfully locks a document. Avoiding long retries will allow for faster response time and less reads.
So in summary, a connection tries to retrieve a document. It retrieves the document but fails to successfully create a lock because another incoming connection just locked it. So it looks for another document and also fails. It keeps retrying until it beats another connnection to locking the document.
Is it possible to increase throughput and keep read costs low as I scale up?
Yeah, I doubt such a mutex is going to help your throughput.
How important is it that documents are processed in the exact order that they are in the queue? If it is not crucial, you could consider having each client request the first N documents, and then picking one at random to lock-and-process. That would improve your throughput up to N times.

How does SQLite prevent deadlocks with deferred transactions?

According to the documentation on deferred ransactions:
The default transaction behavior is deferred. (...) The first read operation against a database creates a SHARED lock and
the first write operation creates a RESERVED lock.
Also according to the documentation on locks:
Any number of processes can hold SHARED locks at the same time (...)
Only a single RESERVED lock may be active at one time, though multiple
SHARED locks can coexist with a single RESERVED lock
This sounds like a multiple readers/single writer lock with arbitrary reader-to-writer promotion mechanism, which is known to be a deadlock hazard:
A starts transaction
B starts transaction
A acquires SHARED lock and reads something
B acquires SHARED lock and reads something
A acquires RESERVED lock and prepares to write something. It can't write as long as there are other SHARED locks so it blocks.
B wishes to write so tries to take RESERVED lock. There is already another RESERVED lock so it blocks until it is released, still holding the SHARED lock.
Deadlock.
So how does SQLite get around this? Two possible solutions come to my mind, but both of them seem to break the whole idea of a transaction:
Would-be writers release the SHARED locks before acquiring RESERVED. This would break atomicity between reads and writes.
B doesn't block when trying to take a RESERVED lock, but errors-out. This would mean all the reads would need to be repeated and significantly complicates API usage.
Am I missing something? How does SQLite deal with this? Why would this seemingly dangerous type of transaction be the default?
By simple trial and error, I discovered that they took the error-out route.
In the given scenario, when B tries to take RESERVED, it will first wait for PRAGMA busy_timeout milliseconds. Then it will report Error: database is locked. The transaction will still be active, so an immediate retry is possible.
If A afterwards tries to COMMIT (or if it runs out of in-memory cache), it will take the PENDING lock (preventing additional SHARED locks) and then wait for EXCLUSIVE. If some SHARED locks remain after PRAGMA busy_timeout milliseconds, it will report Error: database is locked. The transaction will still be active, so an immediate retry is possible.
In other words, the deadlock prevention mechanism in use is timeout. However, it does require the API users to cooperate by rolling back and trying again.
As a guideline:
Use just BEGIN TRANSACTION (or explicitly BEGIN DEFERRED TRANSACTION) when you only expect to read. Writes could possibly fail, forcing you to rollback and retry the entire transaction again.
Use BEGIN IMMEDIATE TRANSACTION when you expect to maybe write at some point. This will block all other writers and all other immediate maybe-writers.
BEGIN EXCLUSIVE TRANSACTION will immediately block until all other locks are released. I have no idea why anyone would want this. Possibly to prepare for some data which needs to be written to disk as quickly as possible once it arrives? EDIT: It seems to be the only way to prevent timeouts at arbitrary points after beginning a transaction.

Resources