MQSeries: Is syncpoint/rollback possible when getting asynchronously with MCB? - asynchronous

I want to pull messages off a MQS queue in a C client, and would love to do so asynchronously so I don't have to start (explicitly) multithreading. The messages will be forwarded to another system that acts "transactionally" but is completely incompatible with XA. So I'd like to have a way to explicitly commit (and thereby remove) a message that's been successfully handed off to the other system, and not commit if this failed, so that the last message is retained for a more successful later attempt.
I've read about the SYNCPOINT option and understand how I'd use that around a regular GET, but I haven's seen any hints on how to make asynchronous message retrieval have transactional behavior like this. Any hints, please?

I think you are describing using the asynchronous callback capability, ie you register a routine to be called when a message arrives, and ask for any get to be under syncpoint... An explanation of how some of it works is in here, https://share.confex.com/share/117/webprogram/Handout/Session9513/share_advanced_mqi.pdf page 4+
Effectively you get called with the MQ message under syncpoint, do your processing with another system, then commit or rollback the message before returning.
Be aware without the use of e.g. XA 2 phase commit, there is always going to be the windows of e.g. committing to the external system and a power outage means the message under the unit of work gets rolled back inside MQ as you didnt have time to perform the commit.

Edit: my misunderstanding, didn't realise that the application was using a callback to retrieve messages, which is indeed fully asynchronous behavior. Disregard the answer below.
Do MQGET with MQGMO_SYNCPOINT, then issue either MQCMIT or MQBACK.
"Asynchronous" and "synchronous" may be misnomers - these are your patterns of using MQ - whether you wait for a reply message or not, these patterns do not affect how MQ processes your calls. Transaction management (unit of work management) works across any MQI calls that use SYNCPOINT, no matter if they are part of a request/reply pattern or not.

Related

Axon FW 4.6 comes with Dead Letter Queue support, but is it possible to still have rollback support in case of exception?

we're looking into using the new feature of Axon, dead letter queue(DLQ).
In our previous application (axon 4.5x) we have a lot of eventhandlers updating projections. Our default is to rethrow exceptions when they occur, which will trigger a rollback for the database updates. Perhaps not the best practice to rely on this behaviour (because it can not rollback everything, eg sending an email from event can not be reverted of course)
This behaviour seems to get lost when introducing DLQ in our applications, which has big impact on our current application flow (projections are updated when they previously weren't). This makes upgrading not that easy.
Is it possible to still get the old behaviour(transaction rolled back in case of exceptions) together with DLQ processing?
What we tried was building a test application to test the new DLQ features. While playing around all looks fine in case of exceptions (they were moved to dlq) but the projections still got updated (not rolled back as before)
We throw an exception after the .save() of the projection simulating a database failure to see if events involved (we have multiple eventhandlers for an event updating projections) got rolled back.
You need to choose here, #davince. Storing a dead-letter in the dead-letter queue similarly requires a database transaction.
To ensure the token still progresses and the dead letter is entered, the framework uses the existing transaction.
Furthermore, in practical terms event handling was successful.
Hence, a rollback for some of the parts wouldn't be feasible.
The best way to deal with this, as is mentioned in the Reference Guide, is to make your event handlers idempotent:
Before configuring a SequencedDeadLetterQueue it is vital to validate whether your event handling functions are idempotent.
As a processing group consists of several Event Handling Components
(as explained in the intro of this chapter), some handlers may succeed
in event handling while others will not. As a configured dead-letter
queue does not stall event handling, a failure in one Event Handling
Component does not cause a rollback for other event handlers.
Furthermore, as the dead-letter support is on the processing group
level, dead-letter processing will invoke all event handlers for that
event within the processing group.
Thus, if your event handlers are not idempotent, processing letters may result in undesired side effects.
Hence, we strongly recommend making your event handlers idempotent when using the dead-letter queue.
The principle of exactly-once delivery is no longer guaranteed; at-least-once delivery is the reality to cope with.
I understand this requires some rework on your end, #davince. Although the feature is powerful, it comes with a certain level of responsibility before you use it.
I hope the above clarifies this for you.
Added, I'd like to point out that the version upgrade in itself does not require you to use the dead-letter queue. Hence, this change shouldn't impose any strains for updating to the latest release.
Update 1
Sometimes you need to think about an issue a bit longer.. I was just wondering the following things about your setup. Perhaps I can help out on that front:
What storage mechanism do you use to store projections in?
Where are you storing your tokens?
Where are you planning to store your dead-letters?
How are you invoking the storage layer from your event handlers?
Update 2
Thanks for sharing that you're using PostgreSQL for your projections, tokens, and dead letters. And that you're using JPA entities for storage.
It gives more certainty about your setup, as it may impact how your system would react in case of a rollback.
However, as it's rather vanilla/regular, the shared comment from the Reference Guide still applies.
This, sadly enough, means some work on your end, #davince. I hope the road forward to start using the SequencedDeadLetterQueue from Axon Framework is clear.
By the way, if you ever have recommendations on how the framework or the documentation may be improved, be sure to file issues in GitHub here and here, respectively.

Axon Event Processing Timeout

I am using an Axon Event Tracking processor. Sometimes events take longer that 10 seconds to process.
This seems to cause the message to be processed again and this appears in the log "Releasing claim of token X/0 failed. It was owned by another node."
If I up the number of segments it does not log this BUT the event is still processed twice so I think this might be misleading. (I think I was mistaken about this)
I have tried adjusting the fetchDelay, cleanupDelay and tokenClaimInterval. None of which has fixed this. Is there a property or something that I am missing?
Edit
The scenario taking longer than 10 seconds is making a HTTP request to an external service.
I'm using axon 4.1.2 with all default configuration when using with Spring auto configuration. I cannot see the Releasing claim on token and preparing for retry in [timeout]s log.
I was having this issue with a single segment and 2 instances of the application. I realised I hadn't increased the number of segments like I thought I had.
After further investigation I have discovered that adding an additional segment seems to have stopped this. Even if I have for example 2 segments and 6 applications it still doesn't reappear, however I'm not sure how this is different to my original scenario of 1 segment and 2 application?
I didn't realise it would be possible for multiple threads to grab the same tracking token and process the same event. It sounds like the best action would be to put an idem-potency check before the HTTP call?
The Releasing claim of token [event-processor-name]/[segment-id] failed. It was owned by another node. message can only occur in three scenarios:
You are performing a merge operation of two segments which fails because the given thread doesn't own both segments.
The main event processing loop of the TrackingEventProcessor is stopped, but releasing the token claim fails because the token is already claimed by another thread.
The main event processing loop has caught an Exception, making it retry with a exponential back-off, and it tries to release the claim (which might fail with the given message).
I am guessing it's not options 1 and 2, so that would leave us with option 3. This should also mean you are seeing other WARN level messages, like:
Releasing claim on token and preparing for retry in [timeout]s
Would you be able to share whether that's the case? That way we can pinpoint a little better what the exact problem is you are encountering.
By the way, very likely you have several processes (event handling threads of the TrackingEventProcessor) stealing the TrackingToken from one another. As they're stealing an un-updated token, both (or more) will handled the same event. Hence why you see the event handler being invoked twice.
Obviously undesirable behavior and something we should resolve for you. I would like to ask you to provide answers to my comments under the question, as right now I have to little to go on. Let us figure this out #Dan!
Update
Thanks for updating your question #dan, that's very helpful.
From what you've shared, I am fairly confident that both instances are stealing the token from one another. This does depend though on whether both are using the same database for the token_entry table (although I am assuming they are).
If they are using the same table, then they should "nicely" share their work, unless one of them takes to long. If it takes to long, the token will be claimed by another process. This other process in this case is the thread of the TEP of your other application instance. The "claim timeout" is defaulted to 10 seconds, which also corresponds with the long running event handling process.
This claimTimeout is adjustable though, by invoking the Builder of the JpaTokenStore/JdbcTokenStore (depending on which you are using / auto wiring) and calling the JpaTokenStore.Builder#claimTimeout(TemporalAmount) method. And, I think this would be required on your end, giving the fact you have a long running operation.
There are of course different ways of tackling this. Like, making sure the TEP is only ran on a single instance (not really fault tolerant though), or offloading this long running operation to a schedule task which is triggered by the event.
But, I think we've found the issue at least, so I'd suggest to tweak the claimTimeout and see if the problem persists.
Let us know if this resolves the problem on your end #dan!

How to make BizTalk only take one message at a time from the MSMQ

I have a BizTalk orchestration that is picking up messages from an MSMQ. It processes the message and sends it on to another system.
The thing is, whenever a message is put on the queue, BizTalk dequeues it immediately even if it is still processing the previous message. This is a real pain because if I restart the orchestration then all the unprocessed messages get deleted.
Is there any way to make BizTalk only take one message at a time, so that it completely finishes processing the message before taking the next one?
Sorry if this is an obvious question, I have inherited a BizTalk system and can't find the answer online.
There are three properties of the BizTalk MSMQ adapter you could try to play around with:
batchSize
Specifies the number of messages that the adapter will take off the queue at a time. The default value is 20.
This may or may not help you. Even when set to 1, I suspect BTS will try to consume remaining "single" messages concurrently as it will always try parallel processing, but I may be wrong about that.
serialProcessing
Specifies messages are dequeued in the order they were enqueued. The default is false.
This is more likely to help because to guarantee ordered processing, you are fundamentally limited to single threaded processing. However, I'm not sure if this will be enough on its own, or whether it will only mediate the ordering of message delivery to the message box database. You may need to enable ordered delivery throughout the BTS application too, which can only be done at design time (i.e. require code changes).
transactional
Specifies that messages will be sent to the message box database as part of a DTC transaction. The default is false.
This will likely help with your other problem where messages are "getting lost". If the queue is non-transactional, and moreover, not enlisted in a larger transaction scope which reaches down to the message box DB, that will result in message loss if messages are dequeued but not processed. By making the whole process atomic, any messages which are not committed to the message box will be rolled back onto the queue.
Sources:
https://msdn.microsoft.com/en-us/library/aa578644.aspx
While you can process the messages in order by using Ordered Delivery, there is no way to serialize to they way you're asking.
However, merely stopping the Orchestration should not delete anything, much less 'all the unprocessed messages'. Seems that's you problem.
You should be able to stop processing without losing anything.
If the Orchestration is going into a Suspended state, then all you need to do is Resume that one orchestration and any messages queued will remain and be processed. This would be the default behavior even if the app was created 'correctly' by accident ;).
When you Stop the Application, you're actually Terminating the existing Orchestration and everything associated with it, including any queued messages.
Here's your potential problem, if the original developer didn't properly handle the Port error, the Orchestration might get stuck in an un-finishable Loop. That would require a (very minor) mod to the Orchestration itself.

How do I specify a redelivery policy and separate retry queue processor in Rebus

I'm currently investigating Rebus but being unable to find good documentation this process is proving difficult. I am hoping someone can help me understand this exciting product.
I have read that during message processing, if something goes wrong the message will return to the queue.
Is the message returned to the front of the queue or placed on the end? If placed on the front this will be problem because the queue in essence becomes blocked with a message that may not be able to be processed - at least until it times out or retries exceeded.
Does Rebus have support for an out-of-the-box separate Retry queue?
Can I specify the interval between retries?
Can I specify an exponential backoff interval for retries as in Apache ActiveMQ?
Thanks
1) The queue transaction is rolled back, effectively moving the message back in front - therefore, it will be immediately retried.
After 5 failed attempts (at least that is the default), Rebus will move the message to the error queue. The default retry mechanism is intentionally very swift - this way, the input queue will never be clogged by poisonous messages.
If you need more sophisticated retries, I suggest you tage a look at bus.Defer - it can defer delivery of a message to the future. It requires that you have a timeout manager(*) running though.
2) I guess that's what I call "error queue", except there's no retry :)
I did create a solution some time, though, where I coded a simple endpoint that would periodically empty the error queue and move all the messages back into the original source queue, as a form of crude automatic second-level retry mechanism.
3) No. NServiceBus has the concept of second-level retries, but this is something that I've never really needed (enough) with Rebus. But with Rebus, you're on your own here - it should be fairly easy to do some intelligent bus.Defer that can then be easily adapted to each kind of error that you're expecting.
4) See (3)
I hope that clarifies a bit :)
(*) The timeout manager can be a separate endpoint whose only job in life is to receive a message, hold on to it for a while (i.e. save it to a database), and then return it to the sender when the time has elapsed. The timeout manager can be hosted in-process though, but using the .Timeouts(t => t.???) configuration spell.

Erlang: difference between using gen_server:cast/2 and standard message passing

I was working though a problem and noticed some code where a previous programmer was passing messages using the standard convention of PID ! Message. I have been using gen_server:cast/2. I was wondering if somebody could explain to me the critical differences and considerations when choosing between the two?
There are a few minor differences:
Obviously, the gen_server handles casts in handle_cast and "normal" messages in handle_info.
A cast never fails; it always returns ok. Sending a message with ! fails with badarg if you are sending a message to an atom that is currently not registered by a process. (Sending a message to a pid never causes an error, even if the process is dead.)
If the gen_server is running on a remote node that is not currently connected to the local node, then gen_server:cast will spawn a background process to establish the connection and send the message, and return immediately, while ! only returns once the connection is established. (See the code for gen_server:do_send.)
As for when to choose one or the other, it's mostly a matter of taste. I'd say that if the message could be thought of as an asynchronous API function for the gen_server, then it should use cast, and have a specific API function in the gen_server callback module. That is, instead of calling gen_server:cast directly, like this:
gen_server:cast(foo_proc, {some_message, 42})
make a function call:
foo_proc:some_message(42)
and implement that function like the direct cast above. That encapsulates the specific protocol of the gen_server inside its own module.
In my mind, "plain" messages would be used for events, as opposed to API calls. An example would be monitor messages, {'DOWN', Ref, process, Id, Reason}, and events of a similar kind that might happen in your system.
In addition to legoscia post I would say that it is easier to trace dedicated function API than messages. Especially in prod environment.

Resources