Is there an alternative to the "ordered delivery" on a send port in BizTalk? The sequence of the message is very important to me, so I created an orchestration that suspends the message when it is not in sequence, and resumes it when it is in sequence. I use a long running orchestration and direct port binding.
Now some messages are processed faster in the send pipeline, so it happens that sometimes the messages aren't in sequence (I use file adapter...).
Now when I check the "ordered delivery" the messages are in sequence no matter what, but the performance is really really bad (messages get bulked up in the send ports), so I need to find an alternative for the ordered delivery in the send port.
Any suggestions?
thx
Now ordered delivery does obviously add a lot of overhead with the FIFO pattern. Take a look at this article and look a the FIFO article in the first issue. Also take a look at BizTalk performance in general to help speed up some of the other areas on your solution. Now I've seen a few people try their own custom solution to ordering via .net and SQL and performance wasn't that much better because the ordering pattern takes time to process. ALso take a look at these resources around performance in general:
Considersations when planning a perf test -
http://msdn2.microsoft.com/en-us/library/aa972201.aspx
BizTalk 2006 adapter performance numbers -
http://msdn2.microsoft.com/en-us/library/aa972200.aspx
If your transport in or out is SOAP, read this scalability study -
http://msdn2.microsoft.com/en-us/library/aa972198.aspx
Good proof points for BizTalk performance with relation to infrastructure
setup - http://msdn2.microsoft.com/en-us/library/ms864801.aspx
Do you have multiple locations the data is being sent to? So, it has to be in sequence but could be partioned out. If so you can use correlation and ordered delivery and have several pipes that deliver and speed the process up.
Related
This may appear as a silly question, but I am really confused about the terminology of the ZeroMQ regarding synchronous sockets like REQ and REP.
By my understanding a synchronous communication occurs when a client sends a message an then it blocks, until the response arrives. If ZeroMQ implemented a synchronous communication then only a .send() method would be enough for a synchronous socket.
I think that synchronous sockets terminology of ZeroMQ refers only to the inability of sending more messages until the response of the last message arrives, but the "sender" can still continue its processing ( doing more stuff ) asynchronously.
Is this true?
In that case, is there any straightforward way to implement a synchronous communication using ZeroMQ?
EDIT: Synchronous communication makes sense when I want to invoke a method in a remote process (like RPC). If I want to execute a series of commands in a remote process and each command needs the result of the previous one to do its job then asynchronous communication is not the best option.
To use ZMQ for implementing a synchronous framework, you can very nearly do it using just ZMQ; you can set the high water mark to 1. Unfortunately that's not quite it; what you want is an out going queue length of 0. Even more unfortunately, setting the high water mark to 0 is interpretted by ZMQ as infinity...
So the only option is to implement a synchronous transfer protocol on top of ZMQ. That's not very difficult to do. The conversation between the two ends will be something like "can I send?", "yes you can send now", "ok here it is", "ok I have received it" (both ends return to caller) (or at least the programatic version of that). This sets up what is called an execution rendevous - both ends know that they both reached a certain point of execution.
Technically speaking what you're doing is taking ZeroMQ (Actor Model) and turning it into something more like Communicating Sequential Processes.
RPC
Having said all that, from your edit I think you might like to consider Cap'n Proto. This is a C++ serialisation technology that has a neat RPC trick. If the return from one RPC call is the input to another, you can chain those all together somehow in advance (see here).
Let's start with a first stepforget everything you know about sockets.
ZeroMQ is more a concept of thinking about distributed-systems ( multi-agent like ) and how to design a software, with a use of such a smart signalling / messaging framework.
This is the core aim of the ZeroMQ, to allow designers remain thinking in the application domain and let all the low level dirty work to be operated actually without much of the designers' need to care of.
If have just recently started with ZeroMQ, one may enjoy a short read about a ZeroMQ global view first, before discussing details.
Having read and understood the concept of the ZeroMQ hierarchy, it is way simpler to start on details:
given a local Context() instance is a data-pumping engine and having a REQ/REP Scalable Formal Communications Archetype pattern in mind, the story is now actually a story about a network of distributed-Finite-State-Automata.
local process, operating just one side of the distributed REQ/REP communication archetype has zero power to influence the remote process to receive or not the message that was passed from the local process over to the ZeroMQ delivery services towards the indended recipient(s) in a fair belief. The less the local process can influence the remote process' intent to respond at all or not, so welcome to the realms of distributed multi-agent games.
Both the REQ and the REP formal behaviour has to meet its both the { local | distributed-mode }-expected sort of behaviour -- REQ asks first, REP answers then, so as to keep the contracted promise. The point is, that this behaviour is distributed and split among a pair of nodes, plus there are cases, when network incidents may throw the distributed-FSA into an unsalvageable mutual deadlock ( one may find more posts on this here zeromq quite often ).
So, your local-side REQ code imperatively .send()-s and has no obligation to stop without doing anything reasonable until REP-side .recv( zmq.NOBLOCK )-s or not ( no one has any kind of warranty a remote node exists at all, similarly, one has to set oneselves ready to anticipate and handle all cases, where a remote side will never respond, so many "new" challenges appear from the nature of a distributed multi-agent ecosystem ).
There are smart ways to handle this new breed of distributed chaos and uncertainties, using, best using .poll() and non-blocking forms of either the .send() and .recv()-methods, as these let user-code to remain capable of handling all expected and un-expected events in due time and fashion.
One may also operate rather many co-existent ZeroMQ connections, so as to prioritise and specialise each and any form of the multi-agents' interactions in a distributed system design, even for designing in fault-resilience and similar high-level robustness concept, where asynchronous nature of each of the interactions avoids a need of any sort of coordination or synchronisation with a remote ( possibly even not yet present ) agent, which is principally an autonomous entity, having it's own domain of control, so again, being principally asynchronous to what local-side agent might "expect", the less "influence" in any other form but by an attempt to send "there" a message "telegram".
So yes,ZeroMQ is asynchronous brokerless signalling / messaging framework.
For (almost) synchronous communications, one may take steps and measures to trim down the ( principally distributed ) asynchronous control loops -- best update your post with an MCVE example and details about what are your particular goals for being achieved.
In a Rebus service bus, there is a single message transport queue per endpoint. It is possible for an endpoint to handle more than one message, and it is possible to have only a single endpoint in a system.
Other than the throughput of messages, what reasons are there to use more than a single endpoint in a Rebus service bus system?
Excellent question! :) There can be many reasons why you might want to have several Rebus endpoints active at the same time.
An obvious reason is that you might want to host the endpoints in separate processes so you can update them independently of each other. But since this reason is pretty obvious, I assume you are thinking about reasons one might want to host multiple Rebus endpoints in the same process.
Let me just mention a few(*):
Concurrency requirements
One endpoint might be hosting data that experiences contention and therefore does not benefit from being able to process messages concurrently - this endpoint will probably have only a few threads and low parallelism, possibly 1/1.
Another endpoint might be doing stream-based data processing (e.g. loading blobs from one place into another, downloading data from web services, etc.), which can be done with very high throughput and low resource requirements with one single thread and a high level of parallelism - e.g. 1/20.
Yet another endpoint might be doing a lot of serialization/deserialization, which is usually CPU-bound, and therefore might benefit from running on a many-core box with many worker threads and matching parallelism - e.g. 10/10.
As you can see, the type of tasks performed by an endpoint can call for a configuration that matches the nature of the tasks.
SLAs
One endpoint might be designated for processing low-priority background stuff, like e.g. moving data to cold storage, optimizing storage of historic data, etc.
Another endpoint might be processing messages where low latency is the most important quality attribute.
If these two were using the same queue, the low-priority background stuff could sometimes clog up the queue, hindering low-latency processing of the other messages.
Logical separation
I have many times started out by hosting several Rebus endpoints in the same process because it was easy to deal with during development, while keeping the endpoints separate because they were implementing different business functions.
This way it is easy to physically break them apart some time later on, allowing for a higher degree of separation and independence.
(*) Udi Dahan works with the concepts "business components" and "autonomous components" where the first one is an implementation of a business capability and the second one is what business components are decomposed into, mostly for technical reasons.
I guess you could say that the first two reasons I mentioned are separate endpoints for "autonomous component" reasons, whereas the third is separation because things belong to different business components.
Udi keeps a pretty strict view of these concepts that is completely orthogonal to how the system is physically composed, but I almost always end up with pretty high convergence between logical separation and physical separation.
I discovered Rebus contains FileSystemMessageQueue. It seems too great to be true so I wanted to ask few questions about it :)
Is it thread-safe/process-safe
Is it transactional
Why it uses JSON as serialization format (doesn't it add limitations to POCOs in comparison to binary serializer?)
Could it work as separate without bus? (just as separate dll, not service)
For small amount of messages, could it be replacement of MSMQ? I mean how it could be compared to MSMQ if we speak about local (not-networked), not resource-intensive messaging? would it be as good as MSMQ?
Thanks in advance
The FileSystemMessageQueue started out as a fun experiment because I wanted to use Dropbox as a transport - which actually seems to work, but I have not tested it in any way, except from making the transport pass Rebus' usual transport contract tests and show it off at a couple of user group meetings and such :)
Therefore: Please understand that you'll be the one testing the transport, and if you do use it you'll almost immediately be the one in the world with the most experience in using it :)
</disclamer>
1) The transport keeps track of which message files are currently being handled to ensure that the same file is not being received twice, so you can safely have multiple threads receiving messages in the same endpoint.
You cannot have do competing consumers though, because there's currently no locking that can span multiple processes (could probably be done though, by using the OS to lock the files and keep the file handle for the time it takes to handle the message).
2) No. It satisfies the same at-least-once delivery guarantee as all the other transports in Rebus, but it is not transactional and it is not capable of committing its work atomically.
I've made the transport postpone the actual writing of the outgoing messages to the point after you've done your own work in your message handler, so messages won't be visible to recipients too soon - but in theory you could run into a situation where a bunch of outgoing messages were sent, and then the deletion of the received message file fails, which will result in receiving the same message again - that's why it's called "at least once" ;)
3) It uses JSON because that's an easy way to write an object to a file (even though the actual message body is serialized and encoded using the configured serializer).
4) ??? I don't understand your question :)
5) Yes and no - I guess that it would be just as good as MSMQ if we speak about local and not resource intensive messages.
I haven't performed any load tests, but I'm guessing it will be much slower than MSMQ regarding message volume. I do think that it is capable of transferring messages that are much much bigger than MSMQ though, because MSMQ still has (to my knowledge) a hard upper cap of 4 MB per message.
I have a setup in which two nodes are going to be communicating a lot. On Node A, there are going to be thousands of processes, which are meant to access services on Node B. There is going to be a massive load of requests and responses across the two nodes. The two Nodes, will be running on two different servers, each on its own hardware server.
I have 3 Options: HTTP/1.1 , rpc:call/4 and Directly sending a message to a registered gen_server on Node B. Let me explain each option.
HTTP/1.1 Suppose that on Node A, i have an HTTP Client like Ibrowse, and on Node B, i have a web server like Yaws-1.95, the web server being able to handle unlimited connections, the operating system settings tweaked to allow yaws to handle all connections. And then make my processes on Node A to communicate using HTTP. In this case each method call, would mean a single HTTP request and a reply. I believe there is an overhead here, but we are evaluating options here. The erlang Built in mechanism called webtool, may be built for this kind of purpose.
rpc:call/4 I could simply make direct rpc calls from Node A to Node B. I am not very susre how the underlying rpc mechanism works , but i think that when two erlang nodes connect via net_adm:ping/1, the created connection is not closed but all rpc calls use this pipe to transmit requests and pass responses. Please correct me on this one.Sending a Message from Node A to Node B I could make my processes on Node A to just send message to a registered process, or a group of processes on Node B. This too seems a clean option.
Q1. Which of the above options would you recommend and why, for an application in which two erlang nodes are going to have enormous communications between them all the time. Imagine a messaging system, in which two erlang nodes are the routers :) ? Q2. Which of the above methods is cleaner, less problematic and is more fault tolerant (i mean here that, the method should NOT have single point of failure, that could lead to all processes on Node A blind) ? Q3. The mechanism of your choice: how would you make it even more fault tolerant, or redundant? Assumptions: The Nodes are always alive and will never go down, the network connection between the nodes will always be available and non-congested (dedicated to the two nodes only) , the operating system have allocated maximum resources to these two nodes. Thank you for your evaluations
HTTP is definitely out. Just the round-trip overhead of creating a new connection is a problem.
As for Erlang connections and using Pids, you have the advantage that you can subscribe to node-down messages and handle the case where a node goes down. A single TCP connection should be able to give you very fast speeds, however, be aware that it works like a long pipe: messages are muxed and demuxed on a pipe which can affect latency on the line. It also means that large messages will block small messages from getting through.
How much bandwidth are you aiming for, and at what latency? What is the 95th and 99th percentile of answering messages? It is better to put up some rough numbers and then try to target these than just having "as fast as possible". Set your success criteria first.
Q1: HTTP will add additional overhead and will give you nothing in my opinion. HTTP would be useful if you were designing a REST API. Directly sending messages and rpc:call look about the same as far as overhead is regarded.
Q2: Sending messages is much much clearer. It's the way erlang is designed. With RPC calls you must always track which call is executed where and under which circumstances which can be a huge issue if the two servers have state. Also RPC calls are synchronous.
Q3: I would use UBF if I can afford minor overhead, otherwise I would directly send messages between the erlang nodes. If the bandwidth is an issue other trickery would be needed as well. Like encoding the messages in same way and then using some compression algorithm to reduce the size of the message, alternatively I may ditch the erlang message passing altogether and use UDP sockets.
It is not obvious that ! is the best way to go. Definitely, it is the easiest and the code will be the most elegant.
In terms of scalability, take under consideration that to use rpc/! you have to maintain an erlang cluster. I found it painful having just 10-20 nodes even in private cloud. I would never recommend bigger deployments on e.g. EC2, where io/latency/network is not deterministic.
I recommend to structure the project in a way that will let you exchange communication engine in the future. Also HTTP is pretty heavy, but there are options:
socket-socket (tcp/udp/sctp)
amqp (many benefits connected to load balancing)
zeromq (even nicer than amqp)
Betting on !/rpc and OTP cluster is risky. You will fight with full mesh overhead, master election algos and quorum/partition detection.
I am sending small messages consisting of xml(about 1-2 KB each) across the internet from a windows application to a asp.net web service.
99% of the time this works fine but sometimes a message will take an inordinate amount of time to arive, 25 - 30 seconds instead of the usual 4 - 5 seconds. This delay also causes the message to arrive out of sequence.
Is there anyway i can solve this issue so that all the messages arrive quickly and in squence or is that not possible to gurantee when using a web service in this manner ?
If its not possible to resolve can i please get recomendations of a low latency messaging framework that can deliver messages in order over the internet.
Thanks.
Is there anyway i can solve this issue so that all the messages arrive quickly and in squence or is that not possible to gurantee when using a web service in this manner ?
Using just webservices this is not possible. You will always run into situations where occasionally something will take much longer that it "should". This is the nature of network programming and you have to work around it.
I would also recommend using XMPP for something like this. Have a look at xmpp.org for info on the standard and jabber-net for a set of client libraries for .Net.
Well this is a little off target, but have you looking into the XMPP (Jabber) protocol.
It's the messaging system that GTalk uses. Quite simple to use. Only downside to it, is that you will need a stateful service to receive and process the messages.
I also agree with #Mat's comment. It was the first solution that came to mind, then i remembered that I used XMPP in the pas to acomplish fast/ small and reliable messages between servers.
http://xmpp.org/about-xmpp/
if you search google you will easily find .net libraries which support this protocol.
and there are plenty of free jabber servers out there.
One way to ensure your messages are sent in sequence and are resolved as a batch together is to make one call to the webservice with all messages that are dependent on each other as a single batch.
Traditionally, when you make a call to a web service you do not expect that other calls to the web service will occur in a specific order. It sounds like you have an implicit sequence the data needs to arrive in the destination application, which makes me think you need to group your messages together and send them together to ensure that order.
No matter the speed of the messaging framework, you cannot guarantee to prevent a race condition that could send messages out of order, unless you send one message that has your data in the correct order.
If you are sending messages in a sequence across internet, you will never know how long will take the message to arrive from one point to another. One possible solution is to include in each message its position in the sequence, and in each endpoint implement the logic to order the messages prior to processing them. If you receive a message out of sequence, you can wait for the missing message, or request to the other endpoint to resend it.