How to Improve Performance of Kafka Producer when used in Synchronous Mode - asynchronous

I have developed a Kafka version : 0.9.0.1 application that cannot afford to lose any messages.
I have a constraint that the messages must be consumed in the correct sequence.
To ensure I do not loose any messages I have implemented Retries within my application code and configured my Producer to ack=all.
To enforce exception handling and to Fail Fast I immediately get() on the returned Future from Producer.send(), e.g.
final Future<RecordMetadata> futureRecordMetadata = KAFKA_PRODUCER.send(producerRecord);
futureRecordMetadata.get();
This approach works fine for guaranteeing the delivery of all messages, however the performance is completely unacceptable.
For example it takes 34 minutes to send 152,125 messages with ack=all.
When I comment out the futureRecordMetadata.get(), I can send 1,089,125 messages in 7 minutes.
When I change ack=all to ack=1 I can send 815,038 in 30 minutes. Why is there such a big difference between ack=all and ack=1?
However by not blocking on the get() I have no way of knowing if the message arrived safely.
I know I can pass a Callback into the send and have Kafka retry for me, however this approach has a drawback that messages may be consumed out of sequence.
I thought request.required.acks config could save the day for me, however when I set any value for it I receive this warning
130 [NamedConnector-Monitor] WARN org.apache.kafka.clients.producer.ProducerConfig - The configuration request.required.acks = -1 was supplied but isn't a known config.
Is it possible to asynchronously send Kafka messages, with a guarantee they will ALWAYS arrive safely and in the correct sequence?
UPDATE 001
Is there anyway I can consume messages in kafka message KEY order direct from the TOPIC?
Or would I have to consume messages in offset order then sort programmatically
to Kafka message Key order?

If you expect a total order, the send performance is bad. (actually total order scenario is very rare).
If Partition order are acceptable, you can use multiple thread producer. One producer/thread for each partition.

Related

How to avoid race during Kafka rebalance in concurrent processing with reactor Kafka?

I'm using 'reactor-kafka' out-of-order commits and interval commits (using manual acknowledge for each message). I'm wondering what will happen if I process messages that I polled before the rebalance, while they are currently processed asynchronously in another thread (using publishOn(Schedulers.parallel()) after the kafkaReceiver.receive()).
Will they be committed after the rebalance while the partition may be consumed by a new consumer? I want to avoid this situation, which can lead to processing event at the same time by two consumers, since that can cause race and conflicts (that I need to avoid).
I'll be ok with processing an event that was polled before the rebalance only if I don't acknowledge and commit it. That's because then the new consumer will process this message after the rebalance either (I'm working with 'at least once' strategy so that's fine).
How can I achieve this behaviour? Does checking if the event's source partition belong to the assigned partitions before acknowledging is a good option?
Or is there any way of forcing the acknowledge function to fail if it's an event of old partition that is not currently assigned by the consumer?

How to make BizTalk only take one message at a time from the MSMQ

I have a BizTalk orchestration that is picking up messages from an MSMQ. It processes the message and sends it on to another system.
The thing is, whenever a message is put on the queue, BizTalk dequeues it immediately even if it is still processing the previous message. This is a real pain because if I restart the orchestration then all the unprocessed messages get deleted.
Is there any way to make BizTalk only take one message at a time, so that it completely finishes processing the message before taking the next one?
Sorry if this is an obvious question, I have inherited a BizTalk system and can't find the answer online.
There are three properties of the BizTalk MSMQ adapter you could try to play around with:
batchSize
Specifies the number of messages that the adapter will take off the queue at a time. The default value is 20.
This may or may not help you. Even when set to 1, I suspect BTS will try to consume remaining "single" messages concurrently as it will always try parallel processing, but I may be wrong about that.
serialProcessing
Specifies messages are dequeued in the order they were enqueued. The default is false.
This is more likely to help because to guarantee ordered processing, you are fundamentally limited to single threaded processing. However, I'm not sure if this will be enough on its own, or whether it will only mediate the ordering of message delivery to the message box database. You may need to enable ordered delivery throughout the BTS application too, which can only be done at design time (i.e. require code changes).
transactional
Specifies that messages will be sent to the message box database as part of a DTC transaction. The default is false.
This will likely help with your other problem where messages are "getting lost". If the queue is non-transactional, and moreover, not enlisted in a larger transaction scope which reaches down to the message box DB, that will result in message loss if messages are dequeued but not processed. By making the whole process atomic, any messages which are not committed to the message box will be rolled back onto the queue.
Sources:
https://msdn.microsoft.com/en-us/library/aa578644.aspx
While you can process the messages in order by using Ordered Delivery, there is no way to serialize to they way you're asking.
However, merely stopping the Orchestration should not delete anything, much less 'all the unprocessed messages'. Seems that's you problem.
You should be able to stop processing without losing anything.
If the Orchestration is going into a Suspended state, then all you need to do is Resume that one orchestration and any messages queued will remain and be processed. This would be the default behavior even if the app was created 'correctly' by accident ;).
When you Stop the Application, you're actually Terminating the existing Orchestration and everything associated with it, including any queued messages.
Here's your potential problem, if the original developer didn't properly handle the Port error, the Orchestration might get stuck in an un-finishable Loop. That would require a (very minor) mod to the Orchestration itself.

Kafka - Dynamic / Arbitrary Partitioning

I'm in the process of building a consumer service for a Kafka topic. Each message contains a url to which my service will make an http request. Each message / url is completely independent from other messages / urls.
The problem I'm worried about is how to handle long-running requests. It's possible for some http requests to take 50+ minutes before a response is returned. During that time, I do not want to hold up any other messages.
What is the best way to parallelize this operation?
I know that Kafka's approach to parallelism is to create partitions. However, from what I've read, it seems that you need to define the number of partitions up front when I really want an infinite or dynamic number of partitions (ideally each message gets its own partition created on the fly)
As an example, let's say I create 1,000 partitions. If 1,001+ messages are produced to my topic, the first 1,000 requests will be made but every message after that will be queued up until the previous request in that partition finishes.
I've thought about making the http requests asynchronous but then I seem to run into a problem when determining what offset to commit.
For instance, on a single partition I can have a consumer read the first message and make an async request. It provides a callback function which commits that offset to Kafka. While that request is waiting, my consumer reads the next message and makes another async request. If that request finishes before the first it will commit that offset. Now, what happens if the first request fails for some reason or my consumer process dies? If I've already committed a higher offset, it sounds like this means my first message will never get reprocessed, which is not what I want.
I'm clearly missing something when it comes to long-running, asynchronous message processing using Kafka. Has anyone experienced a similar issue or have thoughts on how to best solve this? Thanks in advance for taking the time to read this.
You should look at Apache Storm for the processing portion of your consumer and leave the message storage and retrieval to Kafka. What you've described is a very common use case in Big Data (although the 50+ minute thing is a bit extreme). In short, you'll have a small number of partitions for your topic and let Storm stream processing scale the number of components ("bolts" in Storm-speak) that would actual make the http requests. A single spout (the kind of storm component that reads data from an external source) could read the messages from the Kafka topic and stream them to the processing bolts.
I've posted an open source example of how to write a Storm/Kafka application on github.
Some follow-on thoughts to this answer:
1) While I think Storm is the correct platform approach to take, there's no reason you couldn't roll your own by writing a Runnable that performs the http call and then write some more code to make a single Kafka consumer read messages and process them with multiply-threaded instances of your runnable. The management code required is a bit interesting, but probably easier to write than what it takes to learn Storm from scratch. So you'd scale by adding more instances of the Runnable on more threads.
2) Whether you use Storm or your own multi-threaded solution, you'll still have the problem of how to manage the offset in Kafka. The short answer there is that you'll have to do your own complex offset management. Not only will you have to persist the offset of the last message you read from Kafka, but you'll have to persist and manage the list of in-flight messages currently being processed. In this way, if your app goes down, you know what messages were being processed and you can retrieve and re-process them when you start back up. The base Kafka offset persistence doesn't support this more complex need, but it's only there as a convenience for the simpler use cases anyway. You can persist your offsets info anywhere you like (Zookeeper, file system or any data base).

How do I specify a redelivery policy and separate retry queue processor in Rebus

I'm currently investigating Rebus but being unable to find good documentation this process is proving difficult. I am hoping someone can help me understand this exciting product.
I have read that during message processing, if something goes wrong the message will return to the queue.
Is the message returned to the front of the queue or placed on the end? If placed on the front this will be problem because the queue in essence becomes blocked with a message that may not be able to be processed - at least until it times out or retries exceeded.
Does Rebus have support for an out-of-the-box separate Retry queue?
Can I specify the interval between retries?
Can I specify an exponential backoff interval for retries as in Apache ActiveMQ?
Thanks
1) The queue transaction is rolled back, effectively moving the message back in front - therefore, it will be immediately retried.
After 5 failed attempts (at least that is the default), Rebus will move the message to the error queue. The default retry mechanism is intentionally very swift - this way, the input queue will never be clogged by poisonous messages.
If you need more sophisticated retries, I suggest you tage a look at bus.Defer - it can defer delivery of a message to the future. It requires that you have a timeout manager(*) running though.
2) I guess that's what I call "error queue", except there's no retry :)
I did create a solution some time, though, where I coded a simple endpoint that would periodically empty the error queue and move all the messages back into the original source queue, as a form of crude automatic second-level retry mechanism.
3) No. NServiceBus has the concept of second-level retries, but this is something that I've never really needed (enough) with Rebus. But with Rebus, you're on your own here - it should be fairly easy to do some intelligent bus.Defer that can then be easily adapted to each kind of error that you're expecting.
4) See (3)
I hope that clarifies a bit :)
(*) The timeout manager can be a separate endpoint whose only job in life is to receive a message, hold on to it for a while (i.e. save it to a database), and then return it to the sender when the time has elapsed. The timeout manager can be hosted in-process though, but using the .Timeouts(t => t.???) configuration spell.

How to force the current message to be suspended and be retried later on from within a custom BizTalk **send** pipeline component?

Here is my scenario. BizTalk needs to transfer a file from a shared/central document library. First BizTalk receives an incoming message with a reference/path to this document in the library. Then it simply needs to read it out from this library and send it (potentially through different adapters). This is in essence, a scenario not so remote from the ClaimCheck EAI pattern.
Some ways to implement a claim check have been documented, noticeably BizTalk ESB Toolkit Claim Check, and BizTalk 2009: Dealing with Extremely Large Messages, Part I & Part II. These implementations do however take the assumption that the send pipeline can immediately read the stream that has been “checked in.”
That is not my case: the document will take some time before it is available in the shared library, and I cannot delay the initial received message. That leaves me with 2 options: either introduce some delay via an orchestration or ensure the send port will later on retry if the document is not there yet.
(A delay can only be introduced via an orchestration, there is no time-based subscriptions in BizTalk. Right?)
Since this a message-only flow I’d figure I could skip the orchestration. I have seen ways on how to have "Custom Retry Logic in Message Only Solution Using Pipeline" but what I need is not only a way to control the retry behavior (as performed by the adapter) but also to enforce it right from within the pipeline…
Every attempt I made so far just ended up with a suspended message that won’t be automatically retried even though the send adapter had retry configured… If this is indeed possible, then where/what should I do?
Oh right… and there is queuing… but unfortunately neither on premises nor in the cloud ;)
OK I may be pushing the limits… but just out of curiosity…
Many thanks for your help and suggestions!
I'm puzzled as to how this could be done without an Orch. The only way I can think of would be along the lines of:
The receive port for the initial messages just 'eats' the messages,
e.g. subscribing these messages to a dummy Send port with the Null Adapter,
ignoring them totally.
You monitor the Shared document library with a receive port, looking for any ? any new? document there.
Any located documents are subscribed by a send port and sent downstream.
An orchestration based approach would be along the lines of:
Orch is triggered by a receive of the Initial notification of an 'upcoming' new file to the library. If your initial notification is request response (e.g. exposed web service, you can immediately and synchronously issue the response)
Another receive port is used to do the monitoring of availability and retrieval of the file from shared library, correlating to the original notification message (e.g. by filename, or other key)
A mechanism to handle the retry if the document isn't available, and potentially an eventual timeout, e.g. if the document never makes it to the shared library.
And on success, a send port to then send the document downstream
Placing the delay shape in the Orch will offer more scalability than e.g. using Thread.Sleep() or similar in custom adapter or pipeline code, since BTS just calculates ad stamps the 'awaken' timestamp on the SQL record and can then dehydrate the orch, freeing up the thread.
The 'is the file there yet?' check can be done with a retry loop, delaying after each failed check, with a parallel branch with a timeout e.g. after an hour or so.
The polling interval can be controlled in the receive location, so I do not understand what you mean by there is no time based subscriptions in Biztalk. You also have a schedule window.
One way to introduce delay is to send that initial message to an internal webservice, which will simply post back the message to Biztalk after a specified time interval.
There are also loopback adapters, which simply post the message back into the messagebox. This can be ammended to add a delay.

Resources