Async commits associated to a KafkaListener are not retried in case of RetriableCommitFailedException - spring-kafka

Async commits associated to a KafkaListener are not retried in case of RetriableCommitFailedException whereas this is the case for sync commits (based on ContainerProperties#commitRetries).
Is there any reason/constraint not to do this for asynchronous commits as well?

It caused problems because we ended up committing old offsets for partitions that already had later offsets committed. It could also cause stack overflows with large failure rates.
https://github.com/spring-projects/spring-kafka/pull/2088

Related

Firestore transaction contention but with different documents

For example, I have 10 documents in my collection. 10 requests come in near the same second and will run the same query. They each will start their own transaction that will try and read 1 document, and then delete that document. Given the firestore documentation with document contention, it is made to seem like contention errors happen when more than one transaction occurs on the same document X amount of times (it is not documented how many times).
Cloud Firestore resolves data contention by delaying or failing one of the operations.
https://cloud.google.com/firestore/docs/transaction-data-contention
However, in this case since 1 of those transactions committed, I am assuming the other 9 that tried to operate on that same document, will retry because the document from the query was "changed" and couldn't commit. Then the next 9 transactions will try to do the same thing, but on another document, and this will continue until all requests finished deleting 1 document and there are no more active transactions.
Would the retry rules of these transactions that kept getting retried, be ABORTED due to contention, even though it's been a different document each time? Or would these transactions just keep getting delayed and retried because the contention is happening on different documents on each attempt?
According to the documentation, the transaction will retry a "finite number of times". This number is dependent on how the SDK itself is configured, which may be different for various SDK platforms and versions. It doesn't matter which contended document(s) caused the retry. The max number of retries is absolute for that transaction in order to avoid excessive work.
Newer versions of the SDK allow configuration of the number of retries (e.g. Android 24.4.0 lets you specify TransactionOptions).

How to avoid race during Kafka rebalance in concurrent processing with reactor Kafka?

I'm using 'reactor-kafka' out-of-order commits and interval commits (using manual acknowledge for each message). I'm wondering what will happen if I process messages that I polled before the rebalance, while they are currently processed asynchronously in another thread (using publishOn(Schedulers.parallel()) after the kafkaReceiver.receive()).
Will they be committed after the rebalance while the partition may be consumed by a new consumer? I want to avoid this situation, which can lead to processing event at the same time by two consumers, since that can cause race and conflicts (that I need to avoid).
I'll be ok with processing an event that was polled before the rebalance only if I don't acknowledge and commit it. That's because then the new consumer will process this message after the rebalance either (I'm working with 'at least once' strategy so that's fine).
How can I achieve this behaviour? Does checking if the event's source partition belong to the assigned partitions before acknowledging is a good option?
Or is there any way of forcing the acknowledge function to fail if it's an event of old partition that is not currently assigned by the consumer?

Cloud Datastore transaction terminated without explicit rollback defined

From following document: https://cloud.google.com/datastore/docs/concepts/transactions
What would happen if transaction fails with no explicit rollback defined? For example, if we're performing put() operation on value arguments.
The document states that transaction should be idempotent, what does this mean with respect to put() operation? It is not clear how idempotency is applied in this context.
How do we detect failure if failure from commit is not reliable according to the documentation?
We are seeing some symptoms where put() against value argument is sometimes partially saving the data. Note we do not have explicit rollback defined.
As you may already know, Datastore transactions are guaranteed to be atomic, which means that it applies the all-or-nothing principle; either all operations succeed or they all fail. This ensures that the data in your database remains consistent over time.
Now, regardless whether you execute put or any other operation in your transaction, your implementation of the code should always ensure that your transaction has either successfully commited or rolled back. This means that if you aren't fully sure whether the commit succeeded, you should explicitly issue a rollback.
However, there may be some exceptions where a commit might fail, and this doesn't necessarily mean that no data was written to your database. The documentation even points out that "you can receive errors in cases where transactions have been committed."
The simple way to detect transaction failures would be to add a try/catch block in your code for when an Exception (failed transactional operation) or DatastoreException (errors related to Datastore - failed commit) are thrown. I believe that you may already have an answer in this Stackoverflow post about this particular question.
A good practice is to make your transactions idempotent whenever possible. In other words, if you're executing a transaction that includes a write operation put() to your database, if this operation were to fail and needed to be retried, the end result should ideally remain the same.
A real world example can be - you're trying to transfer some money to your friend; the transaction consists of withdrawing 20 USD from your bank account and depositing this same amount into your friend's bank account. If the transaction were to fail and had to be retried, the transaction should still operate with the same amount of money (20 USD) as the final result.
Keep in mind that the Datastore API doesn't retry transactions by default, but you can add your own retry logic to your code, as per the documentation.
In summary, if a transaction is interrupted and your logic doesn't handle the failure accordingly, you may eventually see inconsistencies in the data of your database.

How to rollback DB transaction after http connection is lost

Recently at a interview the interviewer asked me a question, below is the question -
Suppose a request is sent to a servlet and the servlet performs several DB transactions(first update and commit, then read and update and again commit) which takes around 3-4 minutes, during that period the user press the cancel button and the connection is lost. How would you rollback the entire transaction.
My answer was - Since Servlet throws IOException we can handle the exception and rollback the transaction.
But again he questioned me what about the DB commits which are already done, how would you rollback that.
I was blank and replied that i never came across that situation. But i would really like to know what could be done in such a situation.
Thanks.
But again he questioned me what about the DB commits which are already
done, how would you rollback that.
I think it was not a servlet related questions.If the transaction was committed in the database you can not rollback it. A database transaction has several properties known as ACID (Atomicity, Consistency, Isolation, Durability). The one that applies in this case is Durability:
"Durability is the ACID property which guarantees that transactions
that have committed will survive permanently"

What are common reasons for deadlocks?

Deadlocks are hard to find and very uncomfortable to remove.
How can I find error sources for deadlocks in my code? Are there any "deadlock patterns"?
In my special case, it deals with databases, but this question is open for every deadlock.
Update: This recent MSDN article, Tools And Techniques to Identify Concurrency Issues, might also be of interest
Stephen Toub in the MSDN article Deadlock monitor states the following four conditions necessary for deadlocks to occur:
A limited number of a particular resource. In the case of a monitor in C# (what you use when you employ the lock keyword), this limited number is one, since a monitor is a mutual-exclusion lock (meaning only one thread can own a monitor at a time).
The ability to hold one resource and request another. In C#, this is akin to locking on one object and then locking on another before releasing the first lock, for example:
lock(a)
{
...
lock(b)
{
...
}
}
No preemption capability. In C#, this means that one thread can't force another thread to release a lock.
A circular wait condition. This means that there is a cycle of threads, each of which is waiting for the next to release a resource before it can continue.
He goes on to explain that the way to avoid deadlocks is to avoid (or thwart) condition four.
Joe Duffy discusses several techniques
for avoiding and detecting deadlocks,
including one known as lock leveling.
In lock leveling, locks are assigned
numerical values, and threads must
only acquire locks that have higher
numbers than locks they have already
acquired. This prevents the
possibility of a cycle. It's also
frequently difficult to do well in a
typical software application today,
and a failure to follow lock leveling
on every lock acquisition invites
deadlock.
The classic deadlock scenario is A is holding lock X and wants to acquire lock Y, while B is holding lock Y and wants to acquire lock X. Since neither can complete what they are trying to do both will end up waiting forever (unless timeouts are used).
In this case a deadlock can be avoided if A and B acquire the locks in the same order.
No deadlock patterns to my knowledge (and 12 years of writing heavily multithreaded trading applications).. But the TimedLock class has been of great help in finding deadlocks that exist in code without massive rework.
http://www.randomtree.org/eric/techblog/archives/2004/10/multithreading_is_hard.html
basically, (in dotnet/c#) you search/replace all your "lock(xxx)" statements with "using TimedLock.Lock(xxx)"
If a deadlock is ever detected (lock unable to be obtained within the specified timeout, defaults to 10 seconds), then an exception is thrown. My local version also immediately logs the stacktrace. Walk up the stacktrace (preferably debug build with line numbers) and you'll immediately see what locks were held at the point of failure, and which one it was attempting to get.
In dotnet 1.1, in a deadlock situation as described, as luck would have it all the threads which were locked would throw the exception at the same time. So you'd get 2+ stacktraces, and all the information necessary to fix the problem. (2.0+ may have changed the threading model internally enough to not be this lucky, I'm not sure)
Making sure all transactions affect tables in the same order is the key to avoiding the most common of deadlocks.
For example:
Transaction A
UPDATE Table A SET Foo = 'Bar'
UPDATE Table B SET Bar = 'Foo'
Transaction B
UPDATE Table B SET Bar = 'Foo'
UPDATE Table A SET Foo = 'Bar'
This is extremely likely to result in a deadlock as Transaction A gets a lock on Table A, Transaction B gets a lock on table B, therefore neither of them get a lock for their second command until the other has finished.
All other forms of deadlocks are generally caused through high intensity use and SQL Server deadlocking internally whilst allocated resources.
Yes - deadlocks occur when processes try to acquire resources in random order. If all your processes try to acquire the same resources in the same order, the possibilities for deadlocks are greatly reduced, if not eliminated.
Of course, this is not always easy to arrange...
The most common (according to my unscientific observations) DB deadlock scenario is very simple:
Two processes read something (a DB record for example), both acquire a shared lock on the associated resource (usually a DB page),
Both try to make an update, trying to upgrade their locks to exclusive ones - voila, deadlock.
This can be avoided by specifying the "FOR UPDATE" clause (or similar, depending on your particular RDBMS) if the read is to be followed by an update. This way the process gets the exclusive lock from the start, making the above scenario impossible.
I recommend reading this article by Herb Sutter. It explains the reasons behind deadlocking issues and puts forward a framework in this article to tackle this problem.
The typical scenario are mismatched update plans (tables not always updated in the same order). However it is not unusual to have deadlocks when under high processing volume.
I tend to accept deadlocks as a fact of life, it will happen one day or another so I have my DAL prepared to handle and retry a deadlocked operation.
A condition that occure whene two process are each waiting for the othere to complete befoure preceding.the result is both procedure is hang.
its most comonelly multitasking and clint/server.
Deadlock occurs mainly when there are multiple dependent locks exist. In a thread and another thread tries to lock the mutex in reverse order occurs. One should pay attention to use a mutex to avoid deadlocks.
Be sure to complete the operation after releasing the lock. If you have multiple locks, such as access order is ABC, releasing order should also be ABC.
In my last project I faced a problem with deadlocks in an sql Server Database. The problem in finding the reason was, that my software and a third party software are using the same Database and are working on the same tables. It was very hard to find out, what causes the deadlocks. I ended up writing an sql-query to find out which processes an which sql-Statements are causing the deadlocks. You can find that statement here: Deadlocks on SQL-Server
To avoid the deadlock there is a algorithm called Banker's algorithm.
This one also provides helpful information to avoid deadlock.

Resources