TCP Socket Piping - tcp

Suppose that you have 2 sockets(each will be listened by other TCP peers) each resides on the same process, how these sockets could be bound, meaning input stream of each other will be bound to output stream of other. Sockets will continuously carry data, no waiting will happen. Normally thread can solve this problem but, rather than creating threads is there more efficient way of piping sockets?

If you need to connect both ends of the socket to the same process, use the pipe() function instead. This function returns two file descriptors, one used for writing and the other used for reading. There isn't really any need to involve TCP for this purpose.
Update: Based on your clarification of your use case, no, there isn't any way to tell the OS to connect the ends of two different sockets together. You will have to write code to read from one socket and write the same data to the other. Depending on the architecture of your process, you may or may not need an additional thread to do this work. For example, if your application is based on a select() loop, then creating another thread is not necessary.

You can avoid threads with an event queue within the process. The WP Message queue article assumes you want interprocess message passing, but if you are using sockets, you kind of are doing interprocess message passing over the same process.

Related

Dynamically Creating Communicators

I have a small communication problem that has consumed hours of search. I am using MPICH2 to communicate between different workers. At some points in my program a process needs to multi-cast a message to a fraction of the workers (2 or 3 out of a total of 20). Therefore, I temporarily need to create a group that includes the ranks of all those workers and then use MPI_BCast. However, this seems to be impossible!
I have tried MPI_Comm_Create but the program simply hangs because it required "every" worker call MPI_Comm_Create. I can not also use MPI_Comm_Split because I do not know the ranks of the recipient workers in advance and hence can not color code them.
Could you please help me.
Why do you need to create a new communicator at all?
Your description, of what you actually want to achieve and what the constraints are is a little lacking, but here are some hints, that might be applicable for your problem.
Sticking to classical two-sided communication, you need at some point a communication that involves all processes to identify the recipients, I guess. You could for example broadcast to everybody who is to be a recipient, and subsequently send the actual message to those with peer-to-peer communication (If this relation is going to change over time, I would not bother with creating a new communicator each time).
You could use MPI's one-sided communication concepts, and simply write messages from the broadcasting rank into dedicated memory areas of the receiving ranks. However, one-sided is often considered somewhat bad and not so good on the performance side.
With MPI-3 you could make use of an non-blocking barrier: All processes open the barrier, and those, which are not the broadcasting rank start immediately testing for the completion of this barrier, open a non-blocking receive for any source and regularly test for that as well, otherwise they proceed as usual. The broadcasting rank however, starts sending out its message to the actual recipients and when it completed that, it waits for the non-blocking barrier to complete. Now, all processes will find the barrier to complete, and now they can stop listening for the receives, those who didn't get a message can simply send a message to themselves to properly close the communication and proceed in their computation.

QuickFix using Pipes, shared memory, message queues etc

Here's my scenario:
In my application i have several processes which communicate with each other using Quickfix which internally use tcp sockets.the flow is like:
Process1 sends quickfix messaage-> process 2 sends quickfix message after processing message from
process 1 -> .....->process n
Similarly the acknowledgement messages flow like,
process n->....->process 1
Now, All of these processes except the last process( process n ) are on the same machine.
I googled and found that tcp sockets are the slowest of ipc mechanisms.
So, is there a way to transmit and recieve quick fix messages( obviously using their apis)
through other ipc mechanisms. If yes, i can then reduce the latency by using that ipc mechanism between all the processes which are on the same machine.
However if i do so, do those mechanisms guarentee the tranmission of complete message like tcp sockets do?
I think you are doing premature optimization, and I don't think that TCP will be your performance bottleneck. Your local LAN latency will be faster than that of your exterior FIX connection. From experience, I'd expect perf issues to originate in your app's message handling (perhaps due to accidental blocking in OnMessage() callbacks) rather than the IPC stuff going on afterward.
Advice: Write your communication component with an abstraction-layer interface so that later down the line you can swap out TCP for something else (e.g ActiveMQ, ZeroMQ, whatever else you may consider) if you decide you may need it.
Aside from that, just focus on making your system work correctly. Once you are sure teh behavior are correct (hopefully with tests to confirm them), then you can work on performance. Measure your performance before making any optimizations, and then measure again after you make "improvements". Don't trust your gut; get numbers.
Although it would be good to hear more details about the requirements associated with this question, I'd suggest looking at a shared memory solution. I'm assuming that you are running a server in a colocated facility with the trade matching engine and using high speed, kernel bypass communication for external communications. One of the issues with TCP is the user/kernel space transitions. I'd recommend considering user space shared memory for IPC and use a busy polling technique for synchronization rather than using synchronization mechanisms that might also involve kernel transitions.

Handle network protocol steps outside main listen-loop

I'm dealing with network programming (especially P2P systems) lately. The usual program I deal with, somewhere has something like this (running in it's own thread):
while True:
handle(receive())
How do I deal with a series of dependent send/receive actions. For example when I want to have something like:
def inviteNode(receiver):
send(receiver, INVITE)
if receive() == OK:
send(receiver, SOME_INFORMATION)
...
I mean several send/receive actions that depend on each other and have a certain order. It would be nice to have something like the inviteNode() above (because all steps of the protocol are at the same location in the code, and you can retrace the order just by looking at the code), but receive() calls outside of my listen loop just won't do it, because how should it be decided which receive() gets to receive the data.
Is having a global state the only solution for this? After doing the first send(receiver, INVITE) do I have to memorize somewhere, that I expect to receive an OK from that specific Node, I just sent the INVITE to? Isn't this very complex when I have several different of these dependent send/receive actions?
PS: Just to make sure: This is about UDP connections.
Seems like you need to use the sockets polling. You can use select() system call (cross-platform solution, but it's not so fast as it can be with special, system dependent API) or if you're targeting linux, you can try to use the epoll API.
The main idea is: you have some network sockets, those can be in different states (for example, the data can be received).
You can query this socket set for some events (Read,Write,Exception) and perform needed actions dependent on what you need.
For example, Nginx http server uses this architecture.
Also, you will need to save the context, for example which sockets should be notified and checked and what data was already received or sended, and other information you may need.
This is a bit complex but can do the job: finite-state machines.
http://linux.die.net/man/2/select
http://msdn.microsoft.com/en-us/library/windows/desktop/ms740141%28v=vs.85%29.aspx

Interprocess Communication Unix C

I have two processes, A and B. B is a process that performs some functions. Process A is the one that controls B. i.e Process A instruct process B by providing data (control and functional) to it.
I have a thread in B dedicated to IPC, All that thread does is to get instructions from process A while the other threads which are running do whatever they have to with the already existing data.
I thought of pipes and shared memory using shmat. But i am not satisfied, I want something like, whenever Process A writes a msg to B, only then should the ipc thread in B has to wake up.. Any idea as how to acheive this?
The specifics sort of depend on what kind of flexibility you need and who is using what pipes, but this should work: Have process B's IPC thread select for readability on the pipe. When process A writes to the pipe, process B's IPC thread will be awoken.
I found a solution. I made one of the threads open one end of the pipe for read, do the actual read and close it. This goes on in a while loop which is infinite one!
The process which wants to write to it will open it only when it needs to write and close it and will eventually end.
Infact this setup avoids synchronisation issues as well. But I don know what are the consequences of it though interms of performances!

What are some tips for buffer usage and tuning in custom TCP services?

I've been researching a number of networking libraries and frameworks lately such as libevent, libev, Facebook Tornado, and Concurrence (Python).
One thing I notice in their implementations is the use of application-level per-client read/write buffers (e.g. IOStream in Tornado) -- even HAProxy has such buffers.
In addition to these application-level buffers, there's the OS kernel TCP implementation's buffers per socket.
I can understand the app/lib's use of a read buffer I think: the app/lib reads from the kernel buffer into the app buffer and the app does something with the data (deserializes a message therein for instance).
However, I have confused myself about the need/use of a write buffer. Why not just write to the kernel's send/write buffer? Is it to avoid the overhead of system calls (write)? I suppose the point is to be ready with more data to push into the kernel's write buffer when the kernel notifies the app/lib that the socket is "writable" (e.g. EPOLLOUT). But, why not just do away with the app write buffer and configure the kernel's TCP write buffer to be equally large?
Also, consider a service for which disabling the Nagle algorithm makes sense (e.g a game server). In such a configuration, I suppose I'd want the opposite: no kernel write buffer but an application write buffer, yes? When the app is ready to send a complete message, it writes the app buffer via send() etc. and the kernel passes it through.
Help me to clear up my head about these understandings if you would. Thanks!
Well, speaking for haproxy, it has no distinction between read and write buffers, a buffer is used for both purposes, which saves a copy. However, it is really painful to do some changes. For instance, sometimes you have to rewrite an HTTP header and you have to manage to move data correctly for your rewrite, and to save some state about the previous header's value. In haproxy, the connection header can be rewritten, and its previous and new states are saved because they are need later, after being rewritten. Using a read and a write buffer, you don't have this complexity, as you can always look back in your read buffer if you need any original data.
Haproxy is also able to make use of splicing between sockets on Linux. This means that it does not read nor write data, it just tells the kernel what to take where, and where to move it. The kernel then automatically moves pointers without copying data to transfer TCP segments from a network card to another one (when possible), but data are then never transferred to user space, thus avoiding a double copy.
You're completely right about the fact that in general you don't need to copy data between buffers. It's a waste of memory bandwidth. Haproxy runs at 10Gbps with 20% CPU with splicing, but without splicing (2 more copies), it's close to 100%. But then consider the complexity of the alternatives, and make your choice.
Hoping this helps.
When you use asynchronous socket IO operation, the asynchronous read/write operation returns immediately, since the asynchronous operation does not guaranty dealing all the data (ie put all the required data to TCP socket buffer or get all the required data from it) successfully with one invocation, the partial data must outlive through mutiple operations. Then you need an application buffer space to keep the data as long as IO operations last.

Resources