I have a use case where I need to consume from a kafka topic, do some work ,produce to another kafka topic with only once semantics and save to a mongo database .After going through docs what I figure is that the kafka transaction and mongo transaction can be synchronized but they are still two different transactions .In the below scenario if the mongo commit fails is there a way to roll back the kafka record that was committed to the topic and replayed from consumer.
producer.send()
producer.sendOffsetsToTransaction()
mongoDao.commit()
If the listener throws an exception, the kafka transaction will be rolled back and redelivered.
If the mongo commit succeeds and the kafka commit fails, you will need to deal with a duplicate delivery.
If you wire the KafkaTransactionManager (or a KafkaChainedTransactionManager containing one) into the listener container, you don't need to send the offsets to the transaction, the container will do it for you before committing.
Related
I am using spring-kafka 2.2.8 and trying to understand what's the main difference between single record consumer and a batch consumer.
As far as I understand, reading messages/bytes from a topic wouldn't be any different for a single record consumer vs batch consumer. The only difference is how the offset is committed. And hence error handling. Is my understanding correct? Please confirm.
With a record-based listener, the records returned by the poll are handed to the listener one at a time. The container can be configured to commit the offsets one-at-a-time, or after all records are processed (default).
With a batch listener, the records returned by the poll are all handed to the listener in one call.
I'm designing a system with multiple bounded contexts (microservices). I will have 2 kind of events.
Domain Events, which happens "in memory" within single transaction (sync)
Integration Events, which are used between bounded contexts (async)
My problem is, how to make sure that once transaction is committed (at this point I'm sure all Domain Events were processed successfully) that Integration Events are successful as well.
When my Transaction is committed, normally I will dispatch Integration Events (e.g. to the queue), but there is possibility that this queue is down as well, so previously just-committed transaction has to be "reverted". How?
The only solution that comes to my mind is to store Integration Events to the same DB, within the same Transaction, and then process the Integration Events records and push them to the queue - this would be something like "using current DB, as a pre-queue, before pushing it to The Real Queue (however I read that using DB for this is an anti-pattern).
Is there any pattern (reliable approach) to make sure both: Transaction commit and Message pushed to the queue is an atomic operation?
EDIT
After reading https://devblogs.microsoft.com/cesardelatorre/domain-events-vs-integration-events-in-domain-driven-design-and-microservices-architectures/ , the author actually suggests the approach of "pre-queue" in same DB (he calls it “ready to publish the event”).
Checkout transactional outbox pattern.
This pattern does create a pre-queue. But the nice part is that pushing messages from pre-queue to real queue is fully decoupled. Instead you have a middleman called, a message relay that reads your transaction logs and pushes your event from to the real queue. Now since sending message and your domain events are fully decoupled, you can do all your domain events in a single transaction.
And make sure you that all your services are idempontent(same result despite duplicate calls). This transactional outbox patter does guarantee that messages are published, but in case when the message relay fails just after publishing(before acknowledging) it would publish the same event again.
Idempotent services is also necessary in other scenarios. As the event bus(the real queue) could have the same issue. Event bus propagates events, services acknowledge, then network error, then since the event bus is not acknowledged, the same event would be sent again.
Hmm actually idempotence alone could solve the whole issue. After the domain events computation completes(single transaction), if publishing message fails the service can simply throw an error without roll back. Since the event is not acknowledged the event bus will send the same event again. Now since the service is idempotent, the same database transaction will not happen twice, it will basically overwrite or better(should) skip and directly move to message publishing and acknowledging.
I'm using spring boot 2.1.7.RELEASE and spring-kafka 2.2.8.RELEASE.And I'm using #KafkaListener annotation to create a consumer and I'm using all default settings for the consumer.
Now, In my consumer, the processing logic includes a DB call and I'm sending the record to DLT if there is an error/exception during processing.
With this setup, If the DB is down for few mins because of some reason, I want to pause/stop my consumer from consuming more records otherwise it keeps on consuming the messages and will get the DB exception and eventually fill up my DLT which I don't want to do unless the DB is back (based on some health check).
Now I've few questions here.
Does spring-kafka provide an option to trigger infinite retry based on the exception type (in this case a DB exception but I want to add few more types of exception based on my consumer logic)
Does spring-kafka provide an option to trigger the message consumption based on a condition?
There is a ContainerStoppingErrorHandler but it will stop the container for all exceptions.
You would need to create a custom error handler that stops (or pauses) the container after a specific failure as well as some mechanism to restart (or resume) the container.
What happens if a transaction fails and the application crashes for other reasons and the transaction is not rolled back?
Also, what happens and how should rollback failures be treated?
You don't have to worry about the impact of your app's crashes on transaction rollbacks (or any other stateful datastore operation).
The application just sends RPC requests for the operations. The actual operation steps/sequence execution, happens on the datastore backend side, not inside your application.
From Life of a Datastore Write:
We'll dive into a bit more detail in terms of what new data is placed
in the datastore as part of write operations such as inserts,
deletions, updates, and transactions. The focus is on the backend work
that is common to all of the runtimes.
...
When we call put or makePersistent, several things happen behind
the scenes before the call returns and sets the entity's key:
The my_todo object is converted into a protocol buffer.
The appserver makes an RPC call to the datastore server, sending the entity data in a protocol buffer.
If a key name is not provided, a unique ID is determined for this entity's key. The entity key is composed of app ID | ancestor keys |
kind name | key name or ID.
The datastore server processes the request in two phases that are executed in order: commit, then apply. In each phase, the datastore
server identifies the Bigtable tablet servers that should receive
the data.
Now, depending on the client library you use, transaction rollback could be entirely automatic (in the ndb python client library, for example) or could be your app's responsibility. But even if it is your app's responsibility, it's a best-effort attempt anyways. Crashing without requesting a rollback would simply mean that some potentially pending operations on the backend side will eventually time out instead of being actively ended. See also related GAE: How to rollback a transaction?
I'm using WebSphere 8.5 with EJB 3.1 and JMS Generic provider.
I need to write messages in a queue using a stateless session bean as a producer. The EJB is annotated with the TransactionAttributeType.REQUIRED because I need to perform some "DB insert" before I send messages on a queue and consume these messages reading records wrote by the producer.
The problem is if I define a JDBC non XA datasource, the producer writes the messages in queue but the server complains about a failed 2 phase-commit of a local resource (the Datasource itself I think) and doesn't call the onMessage method of the MDB. If I define a JDBC XA everything works.
My questions:
Is JMS session required to be a default XA resources? And why?
What happen if I configure my JMS connection factory to create a non XA JMS session in a JTA Transaction? Is that a bad practice?
What happen if the consumer starts to consume message while the producer is still finishing his operations on database? Would the consumer see changes on database because they are in the same transaction?
Thanks in advance, regards
Is JMS session required to be a default XA resources? And why?
You need both resources to be XA. This is distributed transaction - among 2 different resources - database and JMS queue. To participate in one, same transaction they both must be XA (there is an option to have one non XA resource in transaction - using last participant support, but I wouldn't recommend that) .
If your resources are not XA, then you may set bean to NOT_SUPPORTED and handle transaction by yourself - means - manage 2 separate transactions, first to database and second to JMS queue. However, since db transaction will be commited first, you would have to code compensating it, when sending message fails (as you cannot do rollback), to avoid situation were database state has changed and you didn't send the message.
What happens if I configure my JMS connection factory to create a non XA JMS session in a JTA Transaction?
If another resource is a part of that transaction (e.g. database) you will have exception about 2 phase-commit support.
What happen if the consumer starts to consume message while the producer is still finishing his operations on database?
It's not clear for me, what you are asking. If producer first writes to the database, then writes to the queue in one XA transaction, they will be commited at the same time, so consumer will not be able to see the message first.
However, if you create 2 separate transactions (one for db access, second for queue access) you could have a situation, if you first commit the queue, that consumer could read the message. But in that case, consumer will not be able to see changes to the db, if they are not commited.
Would the consumer see changes on database because they are in the same transaction?
Producer and consumer are not in the same transaction (producer creates message and commits, consumer starts separate transaction to read).