Rebus retry policy when RabbitMQ is temporarily down - rebus

I have a dockerized microservice architecture where I am using Rebus with RabbitMQ as message bus.
One container is running RabbitMQ. Other containers are running services that communicate with each other via Rebus/RabbitMQ.
I want my solution to be resilient to container restarts so if for example the RabbitMQ container restarts I expect the other services to be unaffected by that.
I expect that messages sent while RabbitMQ is down are queued up for delivery by Rebus
in the sending service and that they are delivered when the RabbitMQ connection is restored.
To verify that I run this test scenario:
Service A sends a message to service B via Rebus and RabbitMQ. That works fine.
I stop the RabbitMQ container.
Service A sends a message to service B via Rebus and RabbitMQ. That fails because RabbitMQ is unavailable.
I start the RabbitMQ container again.
I can see that Rebus in my services automatically reconnect to RabbitMQ when it is up. That is as expected.
Now that the RabbitMQ connection is restored I would expect that Rebus sends the pending message from Service A to service B, but it does not.
Is this not expected behaviour of Rebus? If not, can I enable this feature?
I have read this topic https://github.com/rebus-org/Rebus/wiki/Automatic-retries-and-error-handling
and tried to configure Rbus like this:
Configure.With(...)
.Options(b => b.SimpleRetryStrategy(maxDeliveryAttempts: 10))
.(...)
but with no luck.

The "delivery attempts" you're configuring is how you configure how many Rebus should try to consume a received message before giving up (i.e. moving it to the error queue).
If Rebus loses its connection to the broker, it will not be able to receive anything for the entire duration of the outage, so stopping RabbitMQ should effectively pause all message processing (possibly with some exceptions in all messages being handled at the instant where RabbitMQ goes away).
Since no Rebus handlers will be running then, while RabbitMQ is down, you will have to deal with outgoing messages sent from other places, e.g. like messages sent/published from a web request.
(...) I expect that messages sent while RabbitMQ is down are queued up for delivery by Rebus (...)
...but Rebus cannot queue anything up, because RabbitMQ is down(*).
The natural thing to do for Rebus in this situation is to give you, the caller, the responsibility of deciding what to do about the problem.
In .NET, you usually do that by throwing an exception back at you. 🙂
This leaves you with the option of
performing some alternative action, or
retrying some more times, or
whatever makes sense in that particular situation
A simple approach to building some resilience into your system in this case would be to use something like Polly to try sending outgoing messages multiple times in cases where it could fail.
I hope that makes sense. Please let me know if anything needs to be elaborated on. 🙂
(*) Of course Rebus could have "cheated" and queued outgoing messages up in memory, but that would make it very hard for you to write resilient code, because you would not know whether an outgoing message had been safely delivered to the broker, or whether it was just sitting in memory waiting to be saved somewhere.

Related

AMQP, RabbitMQ Push API how works?

I'm trying to get a deep understand how works the Push API communication between the client and the RabbitMQ server.
As I know - but correct me in case - the client open a TCP connenction to the broker (RabbitMQ) and keep this connenction alive until the client decision to close it. But during this connection the client can get messages immediately.
My question is, during this connection, do the client monitor the Broker to ask him for messages, or when the Broker forward a message to the Queue, where the client subscribed, just take that connencion and push the data to the client?
first case: client monitor the broker for messages
last case: client don't need to monitor the broker, broker just push the data
or other?
There are two options to receive messages
The client registers a consumer callback (basicConsume) on the channel; the broker then "pushes" messages to the consumer.
The client sends the broker a basicGet and receives one message (if present).
The first use case is the most common.
Since you tagged the question with spring-amqp I assume you are interested in Spring. For the first case, Spring AMQP has a listener container (and #RabbitListener annotation); for the second case, one of the RabbitTemplate receive operations can be used.
I suggest you look at the tutorials to get a basic understanding. They cover several languages including pure java and Spring AMQP.
You can also look at the Spring AMQP Reference Manual.

How can Inspect the error queue from Rebus

I have a input queue which runs fine. Sometimes a message gets on the error queue.
Now i want to be able inspect these messages and maybe forward them to the input queue again if I know this particular message will pass.
How do I start with inspecting the error queue? Are there any best practices?
I can't just do a .CreateBus().Start() because this will trigger the handlers from the normal handlers.
The way you inspect queues and the options you get depends on the chosen transport.
If you're using Rebus with MSMQ, the easiest way to inspect your queues (input queues, error queues, MSMS dead-letter queues) and retry delivery of failed messages is to fire up Rebus Snoop. Rebus has a ReturnToSourceQueue CLI tool for MSMQ as well.
If you're using Azure Service Bus, I can recommend Paolo Salvatori's Service Bus Explorer which I've used a little bit myself on a few projects.
With RabbitMQ, I usually use RabbitMQ's built-in web management plugin to inspect queues, and then Rebus comes with a ReturnToSourceQueue CLI tool for RabbitMQ as well.
If you're using SQL Server, I can recommend firing up SQL Server Management Studio and getting your SQL-fu on ;)
If you want to code something that does some kind of automatic forwarding or handling of failed messages, I can recommend using Rebus' transport implementations (i.e. MsmsMessageQueue (along with MsmqUtil), RabbitMqMessageQueue, AzureServiceBusMessageQueue, etc.) to handle the receiving and sending of raw transport messages - it's an approach that I've used several times myself to e.g. implement crude second level retries mechanisms and forwarding and archival of failed messages etc.

Job Manager Message Queue over HTTP

I have a worker that can currently only accept jobs/tasks over HTTP. That is, instead of running a daemon listening to TCP ports and just getting raw messages, it only listens to HTTP messages. (I know HTTP is just an additional layer over TCP). So jobs have to be constructed and wrapped around HTTP messages.
I want to use a job manager to queue tasks and send these tasks over HTTP to a pool of the workers as described above.
Is there any job managers that relays tasks over HTTP? I don't mean accepting tasks over HTTP, that doesn't matter, but they must be able to send tasks to workers over HTTP.
There are other functionalities a job manager possesses, such as fault tolerance. And even though the HTTP connection is not persistent, is it possible to replicate all the TCP signals a worker will return to the job manager over HTTP?
One solution I was thinking of having a proxy in between that translates the TCP messages into HTTP messages. But this seemed difficult to do.
I believe that better architecture would be mature job queue + wrapper for your worker's API.
You choose job scheduler/queue with your requirements (Celery or whatever you like)
Write a wrapper script which is able to submit jobs to your worker, report worker's status, etc.

Reliable WCF Service with MSMQ + Order processing web application. One way calls delivery

I am trying to implement Reliable WCF Service with MSMQ based on this architecture (http://www.devx.com/enterprise/Article/39015)
A message may be lost if queue is not available (even cluster doesn't provide zero downtime)
Take a look at the simple order processing workflow
A user enters credit card details and makes a payment
Application receives a success result from payment gateway
Application send a message as “fire and forget”/”one way” call to a backend service by WCF MSMQ binding
The user will be redirected on the “success” page
Message is stored in a REMOTE transactional queue (windows cluster)
The backend service dequeue and process the message, completes complex order processing workflow and, as a result, sends an as email confirmation to the user
Everything looks fine as excepted.
What I cannot understand how can we guarantee that all “one way” calls will be delivered in the queue?
Duplex communication is not a case due to the user should be redirected at the result web page ASAP.
Imagine the case when a user received “success” page with language “… Your payment was made, order has been starting to process, and you will email notifications later…” but the message itself is lost.
How durability can be implemented for step 3?
One of the possible solutions that I can see is
3a. Create a database record with a transaction details marked as uncompleted, just to have any record about the transaction. This record may be used as a start point to process the lost message in case of the message will not be saved in the queue.
I read this post
The main thing to understand about transactional MSMQ is that there
are three distinct transactions involved in a transactional send to a
remote queue.
The sender writes the message to a local queue.
The queue manager on the senders machine transmits the message across the wire to the queue manager on the recipient machine
The receiver service processes the queue message and then removes the message from the queue.
But it doesn’t solve described issue - as I know WCF netMsmqBinding‎ doesn’t use local queue to send messages to remote one.
But it doesn’t solve described issue - as I know WCF netMsmqBinding‎
doesn’t use local queue to send messages to remote one.
Actually this is not correct. MSMQ always sends to a remote queue via local queue, regardless of whether you are using WCF or not.
If you send a message to a remote queue then look in Message Queuing in Server Management you will see in Outbound queues that a queue has been created with the address of the remote queue. This is a temporary queue which is automatically created for you. If the remote queue was for some reason unavailable, the message would sit in the local queue until it became available, and then it would be transmitted.
So durability is provided because of the three-phase commit:
transactionally write message locally
transactionally transmit message
transactionally receive and process message
There are instances where you may drop messages, for example, if your message processing happens outside the scope of the dequeue transaction, and also instances where it is not possible to know if the processing was successful (eg back-end web service call times out), and of course you could have a badly formed message which will never succeed processing, but in all cases it should be possible to design for these.
If you're using public queues on a clustered environment then I think there may be more scope for failure as clustering msmq introduces complexity (I have not really used so I don't know) so try to avoid if possible.

Resume BizTalk dehydrated orchestration

How can I resume a dehydrated orchestration ?
the orchestration in question should have been retrieving messages from a MSMQ queue
but the userid permission wasn't set on the queue, so the BizTalk box wasn't able to read from the queue
Corrected the permissions, but the only options are teminate and suspend ?
If the orchestration attempted to start and failed on the MSMQ receive, it's essentially hung and has not removed a message from the queue. I'd terminate it. The orchestration should clear and pickup the new messages. Does your orchestration implement a singleton pattern or are you using ordered delivery on the receive? This makes things a little more complicated.
Shouldn't you be restarting the biztalk service instance for MSMQ?
Dehydrated means the orchestration is still waiting for something. I guess in your case, you must be waiting for a corelated message from MQ. If you restart receive host service instance, it will try to reconnect all connections (MSMQ, SQL, etc that managed by the service instance). Then all messages will be flow through to orchestrations.
update 1:
Check the relevant receive location. Maybe it got disabled by biztalk due to the permission problem. You will have to enable it manually.
update 0:
Your don't have to resume dehydrated orchestration. It's not the orchestration that read from the queue, but the msmq adapter. When a msmq message arrive the receive location will route it into the message box. If the said orchestration have a subscription ( receive port ) that match the msmq message then it will be resumed by the biztalk engine.
Can you suspend, then resume?
It's been a couple years since I did BizTalk. Quirks like this were annoying. Even worse when it's 250k dehydrated and you need to script to restart them. ugh
I feel for you.
BizTalk's ability to resume depends on the place and way it failed, and whether it can replay any part of the operatio; in most cases, when failing in an orchestration, some coding pattern need to be used to allow it to resume.

Resources