Cloud Datastore transaction terminated without explicit rollback defined - google-cloud-datastore

From following document: https://cloud.google.com/datastore/docs/concepts/transactions
What would happen if transaction fails with no explicit rollback defined? For example, if we're performing put() operation on value arguments.
The document states that transaction should be idempotent, what does this mean with respect to put() operation? It is not clear how idempotency is applied in this context.
How do we detect failure if failure from commit is not reliable according to the documentation?
We are seeing some symptoms where put() against value argument is sometimes partially saving the data. Note we do not have explicit rollback defined.

As you may already know, Datastore transactions are guaranteed to be atomic, which means that it applies the all-or-nothing principle; either all operations succeed or they all fail. This ensures that the data in your database remains consistent over time.
Now, regardless whether you execute put or any other operation in your transaction, your implementation of the code should always ensure that your transaction has either successfully commited or rolled back. This means that if you aren't fully sure whether the commit succeeded, you should explicitly issue a rollback.
However, there may be some exceptions where a commit might fail, and this doesn't necessarily mean that no data was written to your database. The documentation even points out that "you can receive errors in cases where transactions have been committed."
The simple way to detect transaction failures would be to add a try/catch block in your code for when an Exception (failed transactional operation) or DatastoreException (errors related to Datastore - failed commit) are thrown. I believe that you may already have an answer in this Stackoverflow post about this particular question.
A good practice is to make your transactions idempotent whenever possible. In other words, if you're executing a transaction that includes a write operation put() to your database, if this operation were to fail and needed to be retried, the end result should ideally remain the same.
A real world example can be - you're trying to transfer some money to your friend; the transaction consists of withdrawing 20 USD from your bank account and depositing this same amount into your friend's bank account. If the transaction were to fail and had to be retried, the transaction should still operate with the same amount of money (20 USD) as the final result.
Keep in mind that the Datastore API doesn't retry transactions by default, but you can add your own retry logic to your code, as per the documentation.
In summary, if a transaction is interrupted and your logic doesn't handle the failure accordingly, you may eventually see inconsistencies in the data of your database.

Related

Understanding what operation types will cause a failed transaction in GAE datastore

After reading the documentation, there is one thing that is not completely clear to me, I am hoping someone can clarify.
Obviously if you have two read-then-write transactions that occur concurrently, one of them will fail. However, if you have one read-then-write transaction in progress, and then another read occurs that was not part of a transaction, will that other non transactional read cancel the transaction, i.e. A transaction to read-then-write on a payment/transaction record should not be cancelled by the non transactional "Get all payment data" report. Is that correct?
Yes, that's correct. Non-transactional reads and read-only transactions do not contend with read-then-write transactions.

How to do batched writes as part of a transaction

I found myself in a situation where I want to perform some operations on the database that should be handled in a single transaction. One of those operations is injecting > 500 documents, so this is throwing an error because it's hitting
maximum 500 writes allowed per request
In order to work around that, you could use batched writes, but I can't figure out how to do batched writes as part of a transaction. It seems like transaction.commit() is not a thing and in the docs transactions and batched writes appear to be two separate concepts.
Generally speaking, we are using transactions to have consistent data. The recommendation that you get:
you could use batched writes
It is for the exact same reason. Unfortunately, you cannot mix them. You have to choose one or the other. Realistic speaking, both the batch and the transaction are used for atomic updates.
A transaction is similar to batch and as the docs states:
All of the operations succeed, or none of them are applied.
The main difference between a batch write and a transaction is that a batch just writes, while a transaction reads and right after then writes.
So the solution in your case is to use Firestore batched-writes to perform 500 operation at a time.
As you have most probably read in the doc:
The Transaction object passed to a transaction's updateFunction
provides the methods to read and write data within the transaction
context.
and this object, in the Client SDKs, has only four methods: get(), set(), update() and delete() which all take a single Firestore Document as parameter.
With the Node.js Server SDK for Google Cloud Firestore, you will note that there is an additional method, getAll(), which "retrieves multiple documents from Firestore. Holds a pessimistic lock on all returned documents".
So, at the time of writing, there is no possibility, to "mix" a Transaction and a Batched Write.

Server Side Locks in Cloud Firestore

I'm curious about the behavior of the locks that are performed when doing server side transactions on Cloud Firestore as mentioned in this video: https://www.youtube.com/watch?time_continue=750&v=dOVSr0OsAoU
My transaction will be reading multiple documents and placing locks on them. My question is do these locks restrict all access to the documents - including concurrent reads from client code that isn't part of a transaction? Or do they only restrict writes?
If they do restrict reads is there any way around this - it could lead to severe slowdown in the app I'm working on.
Also in the case that a transaction tries to lock documents that are already locked - what is the retry pattern - how often does it retry, and is there an exponential backoff?
Thanks!
My transaction will be reading multiple documents and placing locks on them.
A transaction operation is first reading the value of a property within a document in order to perform the write operation. So it requires round trip communications with server in order to ensure that the code inside the transaction completes successfully.
My question is do these locks restrict all access to the documents - including concurrent reads from client code that isn't part of a transaction?
The answer is no, concurrent users can read the content of the document even if you perform a write operation using a transaction.
Also in the case that a transaction tries to lock documents that are already locked - what is the retry pattern - how often does it retry, and is there an exponential backoff?
According to the official documentation regarding Firestore transactions, a transaction can fail only the following cases:
The transaction contains read operations after write operations. Read operations must always come before any write operations.
The transaction read a document that was modified outside of the transaction. In this case, the transaction automatically runs again. The transaction is retried a finite number of times.
The transaction exceeded the maximum request size of 10 MiB.
Transaction size depends on the sizes of documents and index entries modified by the transaction. For a delete operation, this includes the size of the target document and the sizes of the index entries deleted in response to the operation.
A failed transaction returns an error and does not write anything to the database. You do not need to roll back the transaction; Cloud Firestore does this automatically.

How does SQLite prevent deadlocks with deferred transactions?

According to the documentation on deferred ransactions:
The default transaction behavior is deferred. (...) The first read operation against a database creates a SHARED lock and
the first write operation creates a RESERVED lock.
Also according to the documentation on locks:
Any number of processes can hold SHARED locks at the same time (...)
Only a single RESERVED lock may be active at one time, though multiple
SHARED locks can coexist with a single RESERVED lock
This sounds like a multiple readers/single writer lock with arbitrary reader-to-writer promotion mechanism, which is known to be a deadlock hazard:
A starts transaction
B starts transaction
A acquires SHARED lock and reads something
B acquires SHARED lock and reads something
A acquires RESERVED lock and prepares to write something. It can't write as long as there are other SHARED locks so it blocks.
B wishes to write so tries to take RESERVED lock. There is already another RESERVED lock so it blocks until it is released, still holding the SHARED lock.
Deadlock.
So how does SQLite get around this? Two possible solutions come to my mind, but both of them seem to break the whole idea of a transaction:
Would-be writers release the SHARED locks before acquiring RESERVED. This would break atomicity between reads and writes.
B doesn't block when trying to take a RESERVED lock, but errors-out. This would mean all the reads would need to be repeated and significantly complicates API usage.
Am I missing something? How does SQLite deal with this? Why would this seemingly dangerous type of transaction be the default?
By simple trial and error, I discovered that they took the error-out route.
In the given scenario, when B tries to take RESERVED, it will first wait for PRAGMA busy_timeout milliseconds. Then it will report Error: database is locked. The transaction will still be active, so an immediate retry is possible.
If A afterwards tries to COMMIT (or if it runs out of in-memory cache), it will take the PENDING lock (preventing additional SHARED locks) and then wait for EXCLUSIVE. If some SHARED locks remain after PRAGMA busy_timeout milliseconds, it will report Error: database is locked. The transaction will still be active, so an immediate retry is possible.
In other words, the deadlock prevention mechanism in use is timeout. However, it does require the API users to cooperate by rolling back and trying again.
As a guideline:
Use just BEGIN TRANSACTION (or explicitly BEGIN DEFERRED TRANSACTION) when you only expect to read. Writes could possibly fail, forcing you to rollback and retry the entire transaction again.
Use BEGIN IMMEDIATE TRANSACTION when you expect to maybe write at some point. This will block all other writers and all other immediate maybe-writers.
BEGIN EXCLUSIVE TRANSACTION will immediately block until all other locks are released. I have no idea why anyone would want this. Possibly to prepare for some data which needs to be written to disk as quickly as possible once it arrives? EDIT: It seems to be the only way to prevent timeouts at arbitrary points after beginning a transaction.

BizTalk Transactions - Atomic/Long Running

I need some inputs on practical use of BizTalk Atomic and Long Running transactions. I have read all the theory but not sure how the Atomic transaction will work if I am making multiple SQL calls and if some SQL call fails how the previously committed transaction/data will be rolled back.
Need some guide/link/pointer to understand the transaction better.
BizTalk version used: 2010
Main difference is that orchestration will never be persisted during Atomic transaction even on sending data to message box - everything will be made in one transaction established by DTC. Actually message isn't really sent to MB if you sent it from Atomic transaction - it's written but not committed.
Another difference is that Atomic transaction automatically rolls back everything inside in case of failure. So you can be sure all the action inside are done at once or not at all.
In reality Atomic transaction has too many limitations and quite exotic way to do things in BizTalk. I've implemented a lot of solutions with BizTalk but never used Atomic transaction so far. But I use a lot Long Running to force orchestration persist some intermediate state (happens at the end of any transaction scope) or define compensation actions.
See this blog Indetail About Atomic Scope / Transactions in BizTalk Server
In particular:
Please note that, the scope of BizTalk with respect to Atomic Transaction is till BizTalk Server Message Box only. Please consider this before you decide to use Atomic scope.
So regarding the multiple SQL transactions, I don't think you can do it that way with an Atomic shape unless you aren't dependent on the results of the first SQL call. If you want a single SQL transaction that can be rolled back, you are better of doing that in a stored procedure that BizTalk calls.
About the only time I've used Atomic scopes is when having to call a Pipeline from inside an Orchestration or calling BRE.

Resources