bind zmq dealer to multiple repliers - asynchronous

I want to create a proxy server which routes incoming packets from REQ type sockets to one of the REP sockets on one of the computers in a cluster. I have been reading the guide and I think the proper structure is a combination of ROUTER and DEALER on the proxy server. Where the ROUTER passes messages to the dealer to be distributed. However, I cannot figure out how to create this connection scheme. Is this the correct architecture? If so how to I bind a dealer to multiple addresses. The flow I envision is like this REQ->ROUTER|DEALER->[REP, REP, ...] where only one REP socket would handle a single request.

NB: forget about packets -- think in terms of "Behaviour", that's the key
ZeroMQ is rather an abstract layer for certain communication-behavioral patterns, so while terms alike socket do sound similar to what one has read/used previously, the ZeroMQ-world is by far different from many points of view.
This very formalism allows ZeroMQ Formal-Communication-Patterns to grow in scale, to get assembled in higher-order-patterns ( for load-balancing, for fault-tolerance, for performance-scaling ). Mastering this style of thinkign, you forget about packets, thread-sync-issues, I/O-polling and focus on your higher-abstraction-based design -- on Behaviour -- rather than on underlying details. This makes your design both free from re-inventing wheel & very powerful, as you re-use a highly professional tools right for your problem-domain tasks.
DEALER->[REP,REP,...] Segment
That said, your DEALER-node ( in fact a ZMQsocket-access-node, having The Behaviour called a "DEALER" to resemble it's queue/buffering-style, it's round-robin dispatcher, it's send-out&expect-answer-in model ) may .bind() to multiple localhost address:port-s and these "service-points" may also operate over different TransportClass-es -- one working over tcp://, another over inproc://, if that makes sense for your Design Architecture -- ZeroMQ empowers you to use this transparently abstracted from all the "awfull&dangerous" lower level gritty-nitties.
ZeroMQ also allows to reverse .connect() / .bind()
In principle, where helpfull, one may reverse the .bind() and .connect() from DEALER to a known target address of the respective REP entity.

You leave a couple details out that are important to determining the correct architecture.
When you say "from REQ type sockets to one of the REP sockets on one of the computers in a cluster", how do you determine which computer gets the message? Is it addressed to a specific computer? Does a computer announce its availability before it can receive a message? Does each message just get passed to the next one in line in a round-robin fashion? (if it's not the last one, you probably don't want a DEALER socket)
When you say "how do I bind a dealer to multiple addresses", it's not clear what you mean by "addresses"... Do you mean to say that the proxy has a unique IP address that it uses to communicate with each computer in the cluster? Or are you just wondering how to manage the connection to multiple different peers with the same socket? The former is a special case, the latter is simple.
I'm going to work with the following assumptions:
You want a worker computer from the cluster to announce its availability for work before it receives any work, and any computer in the cluster can handle any job. A faster worker, or a worker working on a smaller job, will not have to wait behind some slow worker to finish their job and get a new job first.
The proxy/broker uses a single ip interface to communicate with all workers.
If those are true, then what you want will be closer to this:
REQ->ROUTER|ROUTER->[REQ, REQ, ...]
A worker will create a request to the backend router socket to announce its availability, and await a reply with work. Once it is finished, it will create a new request with the finished work, which again announces its availability. The other half of the pattern you've already worked out.
This is the Simple Pirate Pattern from the ZMQ guide. It's a good place to start, but it's not very robust. This is in the Reliable Request-Reply Patterns section of the guide, and I suggest you read or reread that section carefully as it will guide you well. In particular, they keep refining this pattern into more and more reliable implementations and wind up with the Majordomo pattern, which is very robust and fault tolerant. You should see if you need all the features that provides or if you can scale it back a little. Either way, you should learn and understand what these patterns are doing and why before you make the choice to do something different.

Related

Confusion about synchronous socket in ZeroMQ

This may appear as a silly question, but I am really confused about the terminology of the ZeroMQ regarding synchronous sockets like REQ and REP.
By my understanding a synchronous communication occurs when a client sends a message an then it blocks, until the response arrives. If ZeroMQ implemented a synchronous communication then only a .send() method would be enough for a synchronous socket.
I think that synchronous sockets terminology of ZeroMQ refers only to the inability of sending more messages until the response of the last message arrives, but the "sender" can still continue its processing ( doing more stuff ) asynchronously.
Is this true?
In that case, is there any straightforward way to implement a synchronous communication using ZeroMQ?
EDIT: Synchronous communication makes sense when I want to invoke a method in a remote process (like RPC). If I want to execute a series of commands in a remote process and each command needs the result of the previous one to do its job then asynchronous communication is not the best option.
To use ZMQ for implementing a synchronous framework, you can very nearly do it using just ZMQ; you can set the high water mark to 1. Unfortunately that's not quite it; what you want is an out going queue length of 0. Even more unfortunately, setting the high water mark to 0 is interpretted by ZMQ as infinity...
So the only option is to implement a synchronous transfer protocol on top of ZMQ. That's not very difficult to do. The conversation between the two ends will be something like "can I send?", "yes you can send now", "ok here it is", "ok I have received it" (both ends return to caller) (or at least the programatic version of that). This sets up what is called an execution rendevous - both ends know that they both reached a certain point of execution.
Technically speaking what you're doing is taking ZeroMQ (Actor Model) and turning it into something more like Communicating Sequential Processes.
RPC
Having said all that, from your edit I think you might like to consider Cap'n Proto. This is a C++ serialisation technology that has a neat RPC trick. If the return from one RPC call is the input to another, you can chain those all together somehow in advance (see here).
Let's start with a first stepforget everything you know about sockets.
ZeroMQ is more a concept of thinking about distributed-systems ( multi-agent like ) and how to design a software, with a use of such a smart signalling / messaging framework.
This is the core aim of the ZeroMQ, to allow designers remain thinking in the application domain and let all the low level dirty work to be operated actually without much of the designers' need to care of.
If have just recently started with ZeroMQ, one may enjoy a short read about a ZeroMQ global view first, before discussing details.
Having read and understood the concept of the ZeroMQ hierarchy, it is way simpler to start on details:
given a local Context() instance is a data-pumping engine and having a REQ/REP Scalable Formal Communications Archetype pattern in mind, the story is now actually a story about a network of distributed-Finite-State-Automata.
local process, operating just one side of the distributed REQ/REP communication archetype has zero power to influence the remote process to receive or not the message that was passed from the local process over to the ZeroMQ delivery services towards the indended recipient(s) in a fair belief. The less the local process can influence the remote process' intent to respond at all or not, so welcome to the realms of distributed multi-agent games.
Both the REQ and the REP formal behaviour has to meet its both the { local | distributed-mode }-expected sort of behaviour -- REQ asks first, REP answers then, so as to keep the contracted promise. The point is, that this behaviour is distributed and split among a pair of nodes, plus there are cases, when network incidents may throw the distributed-FSA into an unsalvageable mutual deadlock ( one may find more posts on this here zeromq quite often ).
So, your local-side REQ code imperatively .send()-s and has no obligation to stop without doing anything reasonable until REP-side .recv( zmq.NOBLOCK )-s or not ( no one has any kind of warranty a remote node exists at all, similarly, one has to set oneselves ready to anticipate and handle all cases, where a remote side will never respond, so many "new" challenges appear from the nature of a distributed multi-agent ecosystem ).
There are smart ways to handle this new breed of distributed chaos and uncertainties, using, best using .poll() and non-blocking forms of either the .send() and .recv()-methods, as these let user-code to remain capable of handling all expected and un-expected events in due time and fashion.
One may also operate rather many co-existent ZeroMQ connections, so as to prioritise and specialise each and any form of the multi-agents' interactions in a distributed system design, even for designing in fault-resilience and similar high-level robustness concept, where asynchronous nature of each of the interactions avoids a need of any sort of coordination or synchronisation with a remote ( possibly even not yet present ) agent, which is principally an autonomous entity, having it's own domain of control, so again, being principally asynchronous to what local-side agent might "expect", the less "influence" in any other form but by an attempt to send "there" a message "telegram".
So yes,ZeroMQ is asynchronous brokerless signalling / messaging framework.
For (almost) synchronous communications, one may take steps and measures to trim down the ( principally distributed ) asynchronous control loops -- best update your post with an MCVE example and details about what are your particular goals for being achieved.

Network design: one-to-many high bandwidth transmissions that excludes users

If StackOverflow is the wrong Exchange for this question, please help direct me to the correct one.
Short Version
What is the best design for a networking application in which one user transmits a constant, high-bandwidth stream of data to many other addresses? The solution must not require the uploader to duplicate the packets for each recipient and preferably will not transmit to users that have not been accepted by the transmitter.
Long Version
A friend and I have written an application that enables someone to transmit data in real time to one or more recipients that he wants to receive the data. I have designed the high-level application protocol to use UDP and to encode the data so that each packet can be lost without hurting the use of the rest. This solution requires managing sockets with each user and sending each packet to every user.
The problem here is that the stream can be very high bandwidth. The user can modify the settings for how high quality the data he is sending should be, and can end up sending 6 Mbps to each user. It is unfeasible to expect a user to pay his ISP enough to be allowed to upload such a stream to the preferred minimum of four other users at a time.
We need a way for the transmitter to send a packet exactly once and have each user receive a copy.
We have looked at multicasting. It may be what we need to use in the end, but we are concerned about the fact that anyone can join any group. It would be preferable to not allow users we do not want to see the data to not be allowed to join in. There is also the problem that if multiple transmitters happen to use the same group, viewers may find that they are receiving multiple streams' worth of data when they only want one.
My searching has revealed something IBM published over a decade ago called Explicit Multicast (Xcast) that looks perfect, but I have yet to find any information to determine whether this technology is commonly supported. Also, I have not yet seen whether it supports datagrams.
Does anyone know the best way to design an application that meets our needs?
Please keep in mind that we have no funds to support our project. Solutions need to be free.
Edit
In the summary above, I hinted at but failed to explicitly state that this is for a real-time application. The motivating drive behind the application is to keep the clients/recipients as close together in time as is possible. If packets are lost or arrive too late to be used in keeping the server and clients in phase, they need to be disregarded. That is why I designed the application protocol on top of UDP with independent data in each packet. Even if a client receives only one packet out of 300 for a given time step, it will use what it did get.
I think that I_am_Helpful's recommendation may be a good step in the right direction (or possibly the destination). I need to do some experimentation to determine whether using a system like Spread will work. However, I do not think I can budget more than additional 17 ms in transmission time.
If you can think of a system that enables sending unreliable datagrams to a specific group of users (like Spread) for a real-time application (unlike Spread, see p. 3), please let me know about it.
We need a way for the transmitter to send a packet exactly once and
have each user receive a copy.
In my limited knowledge, I would say that Reliable Multicasting appears to be one of the viable option for broadcasting in the group. I would like to mention that there are some of the possible Java API's* which could help you achieving the same :
JGroups Java API
The Spread Toolkit -> Spread consists of a library that user applications are linked with, a binary daemon which runs on each computer that is part of the processor group, and various utility and demonstration programs.
Appia
*NOTE : I have never worked with these API's.
It would be preferable to not allow users we do not want to see the
data to not be allowed to join in.
They do provide this feature, e.g., Spread supports thousands of groups with different sets of members. It also provides a range of reliability, ordering and stability guarantees for messages. JGroups can be used to create groups of processes whose members can send messages to each other. It also has facilities like group creation and deletion(Group members can be spread across LANs or WANs).
There is also the problem that if multiple transmitters happen to use
the same group, viewers may find that they are receiving multiple
streams' worth of data when they only want one.
When you could easily create multiple groups in the same network(using Spread,etc.), then, I believe that would no longer be an issue. It is your responsibility to declassify users into different groups.
I hope the given information helps. Good LUCK.
Via multicast you achieve exactly you want: sending each packet once, but authentication seems to be a concern for you.
One possible solution could be simetric cryptography, where the same key is used to encrypt and decrypt. Via TCP your clients connect to a server and fetch the multicast IP Address of the transmission and its associated key, then they join the multicast group and decrypt the incomming transmission.
If you accept a more flexible solution, you could have a server which sends a transmission in real time to a set of distributed servers. Your clients connect to one of these distributed servers via unicast, and after authentication is done, they are inluded in a list of receivers. Each distributed server sends each new transmission packet to each registered client via UDP. in ordinary situations your clients would have the same experience as if it was delivered in a multicast group, but the servers will spend far more bandwidth. Multiple transmission at a time will be allowed, so it could be good for you, and you can have more control, as clients can send signals to the servers, like PAUSE, and etc.

Handle network protocol steps outside main listen-loop

I'm dealing with network programming (especially P2P systems) lately. The usual program I deal with, somewhere has something like this (running in it's own thread):
while True:
handle(receive())
How do I deal with a series of dependent send/receive actions. For example when I want to have something like:
def inviteNode(receiver):
send(receiver, INVITE)
if receive() == OK:
send(receiver, SOME_INFORMATION)
...
I mean several send/receive actions that depend on each other and have a certain order. It would be nice to have something like the inviteNode() above (because all steps of the protocol are at the same location in the code, and you can retrace the order just by looking at the code), but receive() calls outside of my listen loop just won't do it, because how should it be decided which receive() gets to receive the data.
Is having a global state the only solution for this? After doing the first send(receiver, INVITE) do I have to memorize somewhere, that I expect to receive an OK from that specific Node, I just sent the INVITE to? Isn't this very complex when I have several different of these dependent send/receive actions?
PS: Just to make sure: This is about UDP connections.
Seems like you need to use the sockets polling. You can use select() system call (cross-platform solution, but it's not so fast as it can be with special, system dependent API) or if you're targeting linux, you can try to use the epoll API.
The main idea is: you have some network sockets, those can be in different states (for example, the data can be received).
You can query this socket set for some events (Read,Write,Exception) and perform needed actions dependent on what you need.
For example, Nginx http server uses this architecture.
Also, you will need to save the context, for example which sockets should be notified and checked and what data was already received or sended, and other information you may need.
This is a bit complex but can do the job: finite-state machines.
http://linux.die.net/man/2/select
http://msdn.microsoft.com/en-us/library/windows/desktop/ms740141%28v=vs.85%29.aspx

{ ProcessName, NodeName } ! Message VS rpc:call/4 VS HTTP/1.1 across Erlang Nodes

I have a setup in which two nodes are going to be communicating a lot. On Node A, there are going to be thousands of processes, which are meant to access services on Node B. There is going to be a massive load of requests and responses across the two nodes. The two Nodes, will be running on two different servers, each on its own hardware server.
I have 3 Options: HTTP/1.1 , rpc:call/4 and Directly sending a message to a registered gen_server on Node B. Let me explain each option.
HTTP/1.1 Suppose that on Node A, i have an HTTP Client like Ibrowse, and on Node B, i have a web server like Yaws-1.95, the web server being able to handle unlimited connections, the operating system settings tweaked to allow yaws to handle all connections. And then make my processes on Node A to communicate using HTTP. In this case each method call, would mean a single HTTP request and a reply. I believe there is an overhead here, but we are evaluating options here. The erlang Built in mechanism called webtool, may be built for this kind of purpose.
rpc:call/4 I could simply make direct rpc calls from Node A to Node B. I am not very susre how the underlying rpc mechanism works , but i think that when two erlang nodes connect via net_adm:ping/1, the created connection is not closed but all rpc calls use this pipe to transmit requests and pass responses. Please correct me on this one.Sending a Message from Node A to Node B I could make my processes on Node A to just send message to a registered process, or a group of processes on Node B. This too seems a clean option.
Q1. Which of the above options would you recommend and why, for an application in which two erlang nodes are going to have enormous communications between them all the time. Imagine a messaging system, in which two erlang nodes are the routers :) ? Q2. Which of the above methods is cleaner, less problematic and is more fault tolerant (i mean here that, the method should NOT have single point of failure, that could lead to all processes on Node A blind) ? Q3. The mechanism of your choice: how would you make it even more fault tolerant, or redundant? Assumptions: The Nodes are always alive and will never go down, the network connection between the nodes will always be available and non-congested (dedicated to the two nodes only) , the operating system have allocated maximum resources to these two nodes. Thank you for your evaluations
HTTP is definitely out. Just the round-trip overhead of creating a new connection is a problem.
As for Erlang connections and using Pids, you have the advantage that you can subscribe to node-down messages and handle the case where a node goes down. A single TCP connection should be able to give you very fast speeds, however, be aware that it works like a long pipe: messages are muxed and demuxed on a pipe which can affect latency on the line. It also means that large messages will block small messages from getting through.
How much bandwidth are you aiming for, and at what latency? What is the 95th and 99th percentile of answering messages? It is better to put up some rough numbers and then try to target these than just having "as fast as possible". Set your success criteria first.
Q1: HTTP will add additional overhead and will give you nothing in my opinion. HTTP would be useful if you were designing a REST API. Directly sending messages and rpc:call look about the same as far as overhead is regarded.
Q2: Sending messages is much much clearer. It's the way erlang is designed. With RPC calls you must always track which call is executed where and under which circumstances which can be a huge issue if the two servers have state. Also RPC calls are synchronous.
Q3: I would use UBF if I can afford minor overhead, otherwise I would directly send messages between the erlang nodes. If the bandwidth is an issue other trickery would be needed as well. Like encoding the messages in same way and then using some compression algorithm to reduce the size of the message, alternatively I may ditch the erlang message passing altogether and use UDP sockets.
It is not obvious that ! is the best way to go. Definitely, it is the easiest and the code will be the most elegant.
In terms of scalability, take under consideration that to use rpc/! you have to maintain an erlang cluster. I found it painful having just 10-20 nodes even in private cloud. I would never recommend bigger deployments on e.g. EC2, where io/latency/network is not deterministic.
I recommend to structure the project in a way that will let you exchange communication engine in the future. Also HTTP is pretty heavy, but there are options:
socket-socket (tcp/udp/sctp)
amqp (many benefits connected to load balancing)
zeromq (even nicer than amqp)
Betting on !/rpc and OTP cluster is risky. You will fight with full mesh overhead, master election algos and quorum/partition detection.

Network or Transport Layer Fuzzing

How do I go about executing a fuzzing strategy to stress a network stack, specifically at the third and fourth layers (network and transport)? I've looked at frameworks to generate fuzzers, like SPIKE, but it seems to me that they are mostly focused on the application layer and above? Is there any well known techniques out there to fuzz well-known protocols in these layers, say, TCP?
Thanks.
Look at Scapy. It allows you to fuzz at the network and transport layers. The fuzz function will fuzz anything you didn't explicitly specify in the IP or TCP layers (you can apply it separately to each). This gives you a range of abilities from just randomly generating ip addresses and port pairs to making and sending nonsense packets.
You may also want to look at Fragroute. This will twist TCP/IP into using all sorts of evasions techniques, but could potentially unveil otherwise hidden bugs/vulnerabilities in your network stack.
Furthermore, if your organization doesn't object, you could set up a Tor exit node and capture traffic from it. I've found it useful for testing correct TCP connection state tracking. Though your end of the connections is well-known and unchanging, there's a huge variety of servers as well as fun network congestion issues. It's basically an endless source of traffic. Be sure to check with your higher ups as your org may object to being a potential source of malicious traffic (even though there is a strong precedent of non-liability). I've gotten around that issue by running it/capturing at home, then bringing in the pcaps.
If you want to fuzz the IP, UDP, or TCP route your packets from your high level services via loopback to a process that reads them, fuzzes them, and forwards them. You need a driver that lets you talk to raw sockets and you need to read/learn what the applicable RFCs say for those protocols.
There is an easy way to do this. Just as Justdelegard recommends, Scapy is probably the best thing to use, in general.
Take a look at Releasing ICMPv4/IP fuzzer prototype by Laurent GaffiƩ. His Python code, which incidentally he has reposted in more readable fashion at pastebin.com, imports from scapy and uses some methods he defines to do a couple of types of fuzzing. IP and ICMP packets are handled in his sample code. So, this sounds exactly like what you are seeking.
Right now, there seems to be a lot of companies using Tcl/Expect to do custom automated testing of networks. SIP, H.323, layer 2 & 3 protocols, etc.
So if Scapy does not meet your needs, you might be able to make or find something written in Tcl using Expect to do the job. Or, you may wish to do some things in Python, using Scapy - and other things in Tcl, using Expect.
Tcl has long been used for network test and management applications. There was a book on how to use Tcl to do SNMP-based network management way back in the 1990's.
Syntax of Tcl is decidedly odd but the libraries are very powerful. It comes with a framework-like ability to define behavior of custom network behavior atop sockets, similar to what you can do with the standard libraries for the Python programming language.
Unlike Python and other scripting languages, there is an extremely powerful tool for Tcl programs named Expect (see expect man page).
Expect has a handy capability. It can auto generate a Tcl test script. The generated script makes calls to Expect functions. When doing this recording, it functions as a passive man-in-the-middle, recording both sides of the conversation. Kind of the way that you record Macros while you do some editing in MS Word or in Emacs.
Then afterward, you can edit the automatically-generated Expect script to fine tune it, make it behave differently, or creation multiple variations of it. It is very handy for creating regression tests. You should be able to use this to kickstart writing higher layer protocol tests, should you need some. Beats starting from scratch.
I think you can use Tcl/Expect to test standard TCP applications (FTP, HTTP, SMTP, etc.) that use string based commands. It works well for testing character based applications like TELNET that read input from stdin and generate output to stdout too.

Resources