Biztalk Ordered Delivery failure - biztalk

We have a BizTalk application where the order of messages being inputted is very important and has to be kept, meaning they have to be outputted in the same order. Normally ordered delivery would do the trick here.
However I read that ordered delivery is only guaranteed when you connect a receive location directly to a send port. The moment you use orchestrations the order delivery isn't guaranteed anymore. Is there a way to work around or fix this? Because this kind of ruins our whole application and we've been working on this for months.
I read a work around from Microsoft where they use an extra field which has a counter and where they use an end orchestration which checks the counters. But this is way too much work for us to do now. So this work around is a no go. Plus not all messages are translated which creates holes in our flow and not all messages are coming from the same source either which makes this work around useless anyway.
Any other ideas?

Check out this page.
It explains that if you have an orchestration that follows the singleton pattern to ensure only one instance of the orchestration exists, and you make sure you set the orchestration's receive port to ordered delivery, than you should get a valid end-to-end ordered delivery scenario
To provide end-to-end ordered delivery the following conditions must be met:
Messages must be received with an adapter that preserves the order of the messages when submitting them to BizTalk Server. In BizTalk Server 2006, examples of such adapters are MSMQ, MQSeries, and MSMQT. In addition, HTTP or SOAP adapters can be used to submit messages in order, but in that case the HTTP or SOAP client needs to enforce the order by submitting messages one at a time.
You must subscribe to these messages with a send port that has the Ordered Delivery option set to True.
If an orchestration is used to process the messages, only a single instance of the orchestration should be used, the orchestration should be configured to use a sequential convoy, and the Ordered Delivery property of the orchestration's receive port should be set to True.

Resequencing strategies for ordered delivery in BizTalk:
I recently responded to a LinkedIn user's question regarding ordered delivery options in BizTalk.
I thought it would be useful for people to understand some of the strategies for re-sequencing messages using BizTalk.
Often as an BizTalk Developer, you are required to integrate to line-of-business systems which are unchangeable. This can be for one or more of many different reasons. As an example, the cost of changing a system can be too high or the vendor license states that support may be withdrawn if the system is changed.
This would not normally represent a problem where the vendor has provided a thoughtfully designed API as a point-of-integration endpoint. However, as many Integration Developers quickly learn, this is very rarely the case.
What do I mean by a thoughtfully designed API? Well, aside from all the SODA principals (service composition, fault contracts etc.), the most important feature of an API is to support the consumption of data which arrives in the wrong order.
This is a fairly simple thing to do. For example, if you are a vendor and you provide a HTTP operation as your integration point then one of the fields you could expose on your operation is a time-stamp or, even better, a sequence number. This means that if a call is made with an out-of-date payload, the relevant compensating mechanism can kick-in - which can be as simple as discarding the data.
This article discusses what to do when the vendor has not built this feature into an API, and as an integrator you therefore are forced to implement end-to-end ordered delivery as part of your integration solution.
As stated in my response to the user's post on LinkedIn (see link above), in BizTalk ordered delivery in any but the simplest of scenarios is complicated at best and at worst can represent a huge cost in increased complexity, both in terms of development and support. The basic reason is that BizTalk is designed to be massively concurrent to enable high throughput, and there is a direct and unavoidable conflict between concurrency and ordering. Shoe-horning E2E ordered delivery into a BizTalk solution relies on artefacts such as singleton orchestrations which introduce complexity and increase both failure rate and cost-per-failure numbers.
A far better solution is to maintain concurrent processing to as near as possible to the line-of-business system endpoints, and then implement what is called a re-sequencer wrapper around each of the endpoints which require data to be delivered in the correct order.
How to implement such a wrapper in BizTalk depends on some factors, which are outlined in the following table:
|Sequencing |Messages|Database |Wrapper |
|field |are |integration?|strategy |
| |deltas? | | |
|--------------|--------|------------|----------------------------------|
|n of a total m| N | Y |Stored procedure |
|n of a total m| N | N |Singleton orchestration |
|n of a total m| Y | Y |Batched singleton orchestration |
|n of a total m| Y | N |Batched singleton orchestration |
|Timestamp | N | Y |Stored procedure |
|Timestamp | N | N |Singleton orchestration |
|Timestamp | Y | Y |Buffer table with staggered reader|
|Timestamp | Y | N |Buffer table with staggered reader|
The first factor Sequencing field relates to the idea that in order to implement any kind of re-sequencer wrapper, as a minimum you will require that your message data contains some sequencing information. This can take the form of a source time-stamp; however a better, though rarer, kind of sequencing information consists of a sequence number combined with the total number of messages, for example, message 1 of 10, 2 of 10, etc..
The second factor Messages are deltas? relates to whether or not the payload of your message contains a single state change to the data or the sum of all past changes to the data. Put another way, is it possible to reconstruct the full current state of the data from this message? If the message payload contains just a single change then it may not be possible to reconstruct the state of the data from the single message, and in this instance your message is a delta.
The third factor Database integration? relates to whether or not the integration-entry-point to a system is a database. The reason this matters is that integrating at the database layer is a fairly common integration scenario, and if available can greatly simplify handling re-sequencing.
The strategies from the above table are described in detail below:
Stored procedure wrapper
This is the simplest of the resequencing strategies. A new stored procedure is created which queries the target data before making a decision about whether to update the target data. The decision can be as simple as Is the data I have newer than the data in the target system?
Of course, in order to implement this strategy, the target data also has to include the sequencing field of the source data, although an approximation can be made if necessary by relying on existing time-stamps which may already exist in the target data. The stored procedure wrapper can be contained either in the target database or ideally in a separate database.
Singleton orchestration wrapper
The idea behind this strategy is the singleton orchestration. This is a pattern you can implement to ensure that only a single instance of the orchestration will exist at any one time. There are many articles on the web demonstrating how to implement this pattern in BizTalk.
The core of the idea is that the singleton simply keeps a track of the most recent successfully processed message sequence (or time-stamp). If the singleton receives a message which is older than the most recent sequence it is simply discarded. This works because the messages are non-deltas, so the target system can commit only the most recent of a number of messages and the data will be in the most recent state. Only when data is committed successfully is the most recent sequence held by the singleton updated.
Batched singleton orchestration wrapper
This strategy is based on the Singleton orchestration wrapper above, except it is more complex. Rather than only keep the most recent sequence information in memory the singleton is required to create and hold a working set of messages in memory which it will re-order and then process once all expected messages from the batch have arrived. This is because the messages are deltas so the target system MUST receive each message in the order they were intended. Once the batch has been sent successfully the singleton can terminate.
To do this it is a requisite of the source data that it contain a correlation identifier of some description which allows the batch of messages to be defined. For example, processing a defined set of orders from a customer, the inbound messages must contain an identifier for the customer. This can then be used to route the messages to the singleton orchestration instance correlated with this customer. Furthermore the message sequence field available must be of the n of a total m form.
Once the singleton is initialised it assembles a working set of messages in memory and proceeds to populate it as new messages arrive. One way I have seen this done is using a System.Collections.Generic.List as the container for the working set. Once the list has been fully populated (list length = m) then it is assumed all messages in the batch have been received and the orchestration then loops over the working set in sequence and processes the messages into the target system.
One of the benefits of the batched singleton orchestration wrapper is it allows concurrent processing by correlation identifier. In the example above this means that messages from two customers would be processed concurrently.
Buffer table with staggered reader wrapper
Arguably the most complex of the strategies presented, this solution is to be used when you have delta messaging with a time-stamp-based sequencing field. It can be implemented with a database of some description which acts as a re-sequencing buffer.
It is worth noting here that this re-sequencing wrapper does not guarantee ordered delivery, but used well it makes ordered delivery highly likely.
As messages arrive, they are written into the buffer and in the same operation the buffer is reordered, so that the order of messages held in the buffer are always correct.
To create the buffer reader, have a receive location which reads the messages in the buffer before passing the messages to a send port with ordered delivery enabled, which then will process the messages into the target system. You can also use a singleton orchestration as an intermediary if your target system's API semantics are too complex for a send port.
However, using this wrapper as I have described it above will not enable ordered delivery, as the messages will almost certainly be committed to the buffer in the wrong order, which will result in the messages being processed into the target system in the same (wrong) order. This is where the staggered query comes in. This is a fancy way of saying your buffer query needs to only select data at intervals of time T, AND only select those rows where the row-number is lower than buffer total row count minus C.
This has the effect of allowing sequencing to occur over an appropriate timespan. T will be familiar to most BizTalk developers as the polling interval of some adapters (such as the WCF-SQL adapter). C is slightly more difficult to set, but by increasing this number you are reducing the chances that when you poll, you will miss a message older than the most recent one in your retrieved data set.
What T and C are depends on many things, although these values should be based on your latency SLA and your message volume (or throughput). As a guideline, if you have a SLA to deliver data into your target system within 30 seconds and you process 10 messages per second then T should be around 10 seconds and C should be around 100 rows.
Of course this only works if your messages for a given correlation id are sent by the source system during a short space of time (ideally back-to-back). The longer the interval between sends, the greater the required value of C, and the less effective the wrapper becomes.
One of the benefits of this strategy is you can also perform de-duplication of messages in the buffer if your data source is prone to sending duplicate messages and your target system endpoint is not idempotent. You can also use the buffer to implement FILO and other non-standard queueing semantics.
Conclusions
The strategies I have discussed here are ways of bending BizTalk to a task which is wasn't designed to do. As a result each has caveats around cost and complexity to support, and also may not work in certain scenarios. I would like to hear from anyone who has implemented other patterns for ordered delivery in BizTalk.

Related

Seek to an offset via an external trigger

Currently I use the AcknoledgingMessageListener to implement a Kafka consumer using spring-Kafka. This implementation helps me listen on a specific topic and process messages with a manual ack.
I now need to build the following capability:
Let us assume that for an some environmental exception or some entry of bad data via this topic, I need to replay data on a topic from and to a specific offset. This would be a manual trigger (mostly via the execution of a Java class).
It would be ideal if I can retrieve the messages between those offsets and feed it is a replay topic so that a new consumer can process those messages thus keeping the offsets intact on the original topic.
CosumerSeekAware interface - if this is the answer how can I trigger this externally? Via let say a mvn -Dexec. I am not sure if this is even possible
Also let say that I have an crash time stamp with me, is it possible to introspect the topic to find the offset corresponding to the crash so that I can replay from that offset?
Can I find offsets corresponding to some specific data so that I can replay those specific offsets?
All of these requirements are towards building a resilience layer around our Kafka capabilities. I need all of these to be managed by a separate executable class that can be triggered manually providing the relevant data (like time stamps etc). This class should determine offsets and then seek to that offset, retrieve the messages corresponding to those offsets and post them to a separate topic. Can someone please point me in the right direction? I’m afraid I’m going around in circles.
so that a new consumer can process those messages thus keeping the offsets intact on the original topic.
Just create a new listener container with a different group id (new consumer) and use a ConsumerAwareRebalanceListener (or ConsumerSeekAware) to perform the seeks when the partitions are assigned.
Here is a sample CARL that seeks all assigned topics based on a timestamp.
You will need some mechanism to know when the new consumer should stop consuming (at which time you can stop() the new container). Maybe set max.poll.records=1 on the new consumer so he doesn't prefetch past the failure point.
I am not sure what you mean by #3.

How do I guarantee task order processing for a queue with multiple consumers in RabbitMQ?

Say I want to start friendship between A and B.
Say I want to end friendship between A and B.
Those are two tasks I want to send to a queue having multiple consumers (workers).
I want to guarantee processing order so, how to avoid the second task to be performed before the first?
My solution: make tasks sticky (tasks about A are always sent to the same consumer).
Implementation: use RabbitMQ's exchanges and map tasks to the available consumers.
How do I map A to its consumer? I'm thinking about nginx's ip_hash. I think I need something similar.
I don't know if it is relevant but A and B are uuid.v4() UUIDs.
Can you point me out to the algorithm I need to accomplish mapping, please?
Well, there are two options:
make one exchange / queue for all events and guarantee that they're gonna be inserted in proper order. Create one worker for them. This costs more on inserting data (and doesn't give you option of scalability).
prepare your app for such situation, e.g. when you get message destroyFriendship and friendship does not exist - save message to db containing future friendship ending. Then you can have multiple workers making and destroying friendship and do not have to care about proper order. Simply do your job, make friends and if there's row in db about ending of friendship - destroy it (or simply do not create). Of course you need to check timestamp of creation/destroying time and check if destroying time was after creation time!
Of course you can count somehow hash of A/B, but it would be IMO more costfull then preparing app. Scalling app using excahnges/queues is not really good - you're going to create more and more queues and it's going to end up in too many queues/exchanges in rabbitmq.
If you have to use solution you specified - you can for example count crc32 from A and B, and using it's value calcalate to which queue task should be send. But having multiple consumers might result wrong here - what if one of consumers is blocked somehow and other receive message with destroying friendship? Using this solution I'd say that it's dangerous to have more than 1 worker per group of A/B.

How the chances of getting "read-your-writes" consistency are increased in Dynamo?

In Section 5 of Dynamo paper, there is the following content:
In particular, since each write usually follows a read operation, the
coordinator for a write is chosen to be the node that replied fastest to the
previous read operation which is stored in the context information of the
request. This optimization enables us to pick the node that has the data that
was read by the preceding read operation thereby increasing the chances of
getting "read-your-writes" consistency.
How the chances of getting "read-your-writes" consistency are increased?
"read-your-writes" means that a read following a write gets the value set by the
write. The read and the write are performed by two different clients for this
context. The reason is that the choice of the write coordinator does not impact
on the chances of getting "read-your-writes" by the same client.
But the above text is talking about a write following a read. Here is my guess.
The read coordinator will try to do syntactic reconciliation if it is possible.
If syntactic reconciliation is impossible because of divergent versions, the
client need to do semantic reconciliation before doing a write. Either way, the
versions on all the nodes involved in the read operation is an ancestor of the
reconciled version. So the following write can be sent to any of them to get
applied. The earliest time for a write to be seen by a read is after the
following steps are finished:
Client contact the write coordinator.
The write coordinator generates the version clock for the new version.
The write coordinator writes the new version locally.
The shorter the time to perform the above steps, the more likely another
following read sees the new version. Since it is very possible that the node
which replied fastest to the previous read can perform the following steps in a
shorter time. Such a node is chosen as the write coordinator.
Section 2.3 talks about performing the reconciliation at read time rather than write time.
Data versioning - "One can determine whether two versions of an
object are on parallel branches or have a causal ordering, by
examine their vector clocks."
This paragraph from section 4. [emphasis mine]
In Dynamo, when a client wishes to update an object, it must specify
which version it is updating. This is done by passing the context it
obtained from an earlier read operation, which contains the vector
clock information. Upon processing a read request, if Dynamo has
access to multiple branches that cannot be syntactically reconciled,
it will return all the objects at the leaves, with the corresponding
version information in the context. An update using this context is
considered to have reconciled the divergent versions and the
branches are collapsed into a single new version
So by performing the read first, you're effectively reconciling all divergent versions prior to writing. By writing to that same node, the version you've updated is marked with the context and vector clock of the most up to date version and all divergent branches can be collapsed. This is sent to the top N nodes (as you've stated) as fast as possible. But by removing the divergent branches - you reduce the chance that multiple values could be returned. You only need one of the N nodes read in the next read to get the reconciled write. ie - the node as part of the quorum of R reads says - "I am the reconciled version, and all others must bow to me". (and if that has already been distributed to another of the "R" nodes, then there's even greater chance of getting the reconciled version in the quorum)
But, if you wrote to a different node, one that you hadn't read from - the vector clock that is being updated may not necessarily be a reconciled version of the object. Therefore, you could still have divergent branches. The following read will try and reconcile it, but it's more more probable that you could have multiple divergent data and no reconciliation.
If you've made it this far, I think the most interesting part is that per Section 6, client applications can dictate the values of N, R and W - ie - number of nodes that constitute the pool to draw from, and the number of nodes that must agree on a read or write for it to be successful.
Geez - my head hurts now.
I re-read the Dynamo paper. I have a new understanding of "read-your-write" consistency. "read-your-writes" involves only one client. Image the following requests performed by one client on the same key:
read-1
write-1
read-2
"read-your-writes" means that read-2 sees write-1. The write coordinator has the best chance to have write-1. To ensure "read-your-writes", it is desired that the write coordinator replies fastest to read-2. It is highly possible that the node replies fastest to read-1 also reply fastest to read-2. So choose the node replies fastest to read-1 as the write coordinator.
And what is the node that replied fastest to the previous read operation? Such a node only makes sense if client-driven coordination is used. For server-side coordination, the coordinator nodes replies to the client and the other involved nodes reply to the coordinator node. replied fastest is meaningless in this case.

RPC semantics what exactly is the purpose

I was going through the rpc semantics, at-least-once and at-most-once semantics, how does they work?
Couldn't understand the concept of their implementation.
In both cases, the goal is to invoke the function once. However, the difference is in their failure modes. In "at-least-once", the system will retry on failure until it knows that the function was successfully invoked, while "at-most-once" will not attempt a retry (or will ensure that there is a negative acknowledgement of the invocation before retrying).
As to how these are implemented, this can vary, but the pseudo-code might look like this:
At least once:
request_received = false
while not request_received:
send RPC
wait for acknowledgement with timeout
if acknowledgment received and acknowledgement.is_successful:
request_received = true
At most once:
request_sent = false
while not request_sent:
send RPC
request_sent = true
wait for acknowledgement with timeout
if acknowledgment received and not acknowledgement.is_successful:
request_sent = false
An example case where you want to do "at-most-once" would be something like payments (you wouldn't want to accidentally bill someone's credit card twice), where an example case of "at-least-once" would be something like updating a database with a particular value (if you happen to write the same value to the database twice in a row, that really isn't going to have any effect on anything). You almost always want to use "at-least-once" for non-mutating (a.k.a. idempotent) operations; by contrast, most mutating operations (or at least ones that incrementally mutate the state and are thus dependent on the current/prior state when applying the mutation) would need "at-most-once".
I should add that it is fairly common to implement "at most once" semantics on top of an "at least once" system by including an identifier in the body of the RPC that uniquely identifies it and by ensuring on the server that each ID seen by the system is processed only once. You can think of the sequence numbers in TCP packets (ensuring the packets are delivered once and in order) as a special case of this pattern. This approach, however, can be somewhat challenging to implement correctly on distributed systems where retries of the same RPC could arrive at two separate computers running the same server software. (One technique for dealing with this is to record the transaction where the RPC is received, but then to aggregate and deduplicate these records using a centralized system before redistributing the requests inside the system for further processing; another technique is to opportunistically process the RPC, but to reconcile/restore/rollback state when synchronization between the servers eventually detects this duplication... this approach would probably not fly for payments, but it can be useful in other situations like forum posts).

Starting mutliple orchestrations from parent orchestration and passing messages to them

I have a situation where a main orchestration is responsible for processing a convoy of messages. These messages belong to a set of customers, the orchestration will read the messages as they come in, and for each new customer id it finds, it will spin up a new orchestration that is responsible for processing the messages of a particular customer. I have to preserve the order of messages as they come in, so the newly created orchestrations should process the message it has and wait for additional messages from the main orchestration.
Tried different ways to tackle this, but was not able to successfuly implement it.
I would like to hear your opinions on how this could be done.
Thanks.
It sounds like what you want is a set of nested convoys. While it might be possible to get that working, it's going to... well, hurt. In particular, my first worry would be maintenance: any changes to the process would be a pain in the neck to make, and, much worse, deployment would really, really suck.
Personally, I would really try to find an alternative way to implement this and avoid the convoys if possible, but that would depend a lot on your specific scenario.
A few questions, if you don't mind:
What are your ordering requirements? For example, do you only need ordered processing for each customer on a single incoming batch, or across batches? If the latter, could you make do without the master orchestration and just force a single convoy'd instance per customer? Still not great, but would likely simplify things a lot.
What are you failure requirements with respect to ordering? Should it completely stop processing? Save message and keep going? What about retries?
Is ordering based purely on the arrival time of the message? Is there anything in the message that you could use to force ordering internally instead of relying purely on the arrival time?
What does the processing of the individual messages do? Is the ordering requirement only to ensure that certain preconditions are met when a specific message is processed (for example, messages represent some tree structure that requires parents are processed before children).
I don't think you need a master orchestration to start up the sub-orchestrations. I am assumin you are not talking about the master orchestration implmenting a convoy pattern. So, if that's the case, here's what I might do.
There is a brief example here on how to implment a singleton orchestration. This example shows you how to setup an orchestration that will only ever exist once. All the messages going to it will be lined up in order of receipt and processed one at a time. Your example differs in that you want to have this done by customer ID. This is pretty simple. Promote the customer ID in the inbound message and add it to the correlation type. Now, there will only ever be one instance of the orchestration per customer.
The problem with singletons is this. You have to kill them at some point or they will live forever as dehydrated orchestrations. So, you need to have them end. You can do this if there is a way for the last message for a given customer to signal the orchestration that it's time to die through an attribute or such. If this is not possible, then you need to set a timer. If no messags are received in x seconds, terminate the orch. This is all easy to do, but it can introduce Zombies. Zombies occur when that orchestration is in the process of being shut down when another message for that customer comes in. this can usually be solved by tweeking the time to wait. Regardless, it will cause the occasional Zombie.
A note fromt he field. We've done this and it's really not a great long term solution. We were receiving customer info updates and we had to ensure ordered processing. We did this singleton approach and it's been problematic from the Zombie issue and the exeption issue. If the Singleton orchestration throws an exception, it will block the processing for a all future messages for that customer. So - handle every single possible exception. The real solution would have been to have the far end system check the time stamps from the update messages and discard ones that were older than the last update. We wanted to go this way, but the receiving system didn't want to do this extra work.

Resources