Which method is better for sending a stream of images between two processes, local TCP/IP connection or Interprocess communication? - tcp

Assuming that I have to copy each image on the stream (I cannnot simply access that data with any mutex protection, it must be copied anyway), which method is better, pros/cons?
I would like to know also how much performance loss this implied compared to using the images in the same process.
Thanks

For images, IPC through shared memory would be the best option.

At least Windows' firewalls can interfere even with local TCP/IP connections. Therefore I would prefer shared memory.

In term of performance, IPC through shared memory is the best option but IMHO,
even if sockets consume a little more processing, they will give you a better result in term of evolutivity of your software.

Google "Memory Mapped Files"

I would take the VCAM example of a DirectShow capture device (available at:
http://tmhare.mvps.org/downloads/vcam.zip)
This driver appears to the O/S as a video capture device and would run in the destination process. The source would use shared memory buffers to feed it frames to inject.
While more complicated than a minimal shared-memory IPC scheme, it gives an incredible advantage in that your video pipes can connect to most media player programs, capture and editing tools, etc.
I have done this several times, including features like sinks, mixers, Freeframe effect plugins, and so on. It should take a day or two to hack together.

Related

How to fork interactive programs

I have an interactive program with a high start-up cost. After start-up, I'd like to fork the process into separate concurrent sessions. Ideally each separate session would become a GNU screen window but being able to individually telnet/ssh to each session would be fine too.
It shouldn't be too hard to write this from scratch but it seems like something that should have been done/considered before and maybe there are reasons why this is a bad idea...
I know that an alternative approach is to use shared memory for the data that's expensive to initialize. The reason I'm reluctant to go down that path is that the shared data uses C++ data structures with pointers, which makes it hard to mmap it into an unrelated process.
This is what any database does - the startup is phenomenally expensive, but the db provides several different means of connecting - example Oracle's BEQ protocol.
Telnet has issues, consider ssh. Either way, consider a daemon that answers requests for connect on a port (you would use AF_UNIX sockets I guess), then creates a separate session.
Stevens Advanced Programming in the UNIX Envrionment or Rochkind's Advanced UNIX Programming have discussions and complete examples. Since my Stevens book seems to have gone on extensive holiday, see Rochkind 4.3 and 4.10.
And no, there is no pending doom for using this approach.

Reliable udp broadcast libraries?

Are there any libraries which put a reliability layer on top of UDP broadcast?
I need to broadcast large amounts of data to a large number of machines as quickly as possible, and generally it seems like such a problem must have already been solved many times over, but I wasn't able to find anything except for the Spread toolkit, which has a somewhat viral license (you have to mention it in all materials advertising the end product, which I'm not sure our customer will be willing to do).
I was already going to write such a thing myself (because it would be extremely fun to do!) but decided to ask first.
I looked also at UDT (http://udt.sourceforge.net) but it does not seem to provide a broadcast operation.
PS I'm looking at something as lightweight as a library - no infrastructure changes.
How about UDP multicast? Have a look at the PGM protocol for which there are several commercial and open source implementations.
Disclaimer: I'm the author of OpenPGM, an open source implementation of said protocol.
Though some research has been done on reliable UDP multicasting, I haven't yet used anything like that. You should take into consideration that this might not be as trivial as it first sounds.
If you don't have a list of nodes in the target network you have no idea when and to whom to resend, even if active nodes receiving your messages can acknowledge them. Sending to a large number of nodes, expecting acks from all of them might also cause congestion problems in the network.
I'd suggest to rethink the network architecture of your application, e.g. using some kind of centralized solution, where you submit updates to a server, and it sends this message to all connected clients. Or, if the original sender node's address is known a priori, then just let clients connect to it, and let the sender push updates via these connections.
Have a look around the IETF site for RFCs on Reliable Multicast. There is an entire working group on this. Several protocols have been developed for different purposes. Also have a look around Oracle/Sun for the Java Reliable Multicast Service project (JRMS). It was a research project of Sun, never supported, but it did contain Java bindings for the TRAM and LRMS protocols.

UNIX Domain sockets vs Shared Memory (Mapped File)

Can anyone tell, how slow are the UNIX domain sockets, compared to Shared Memory (or the alternative memory-mapped file)?
Thanks.
It's more a question of design, than speed (Shared Memory is faster), domain sockets are definitively more UNIX-style, and do a lot less problems. In terms of choice know beforehand:
Domain Sockets advantages
blocking and non-blocking mode and switching between them
you don't have to free them when tasks are completed
Domain sockets disadvantages
must read and write in a linear fashion
Shared Memory advantages
non-linear storage
will never block
multiple programs can access it
Shared Memory disadvantages
need locking implementation
need manual freeing, even if unused by any program
That's all I can think of now. However, I'd go with domain sockets any day -- not to mention that it's a lot easier then to reimplement them to do distributed computing. The speed gain of Shared Memory will be lost because of the need of a safe design. However, if you know exactly what you're doing, and use the proper kernel calls, you can achieve greater speed with Shared Memory.
In terms of speed shared memory is definitely the winner. With sockets you will have at least two copies of the data - from sending process to the kernel buffer, then from the kernel to the receiving process. With shared memory the latency will only be bound by the cache consistency algorithm between the cores on the box.
As Kornel notes though, dealing with shared memory is more involved since you have to come up with your own synchronization/signalling scheme, which might add a delay depending on which route you go. Definitely use semaphores in shared memory (implemented with futex on Linux) to avoid system calls in non-contended case.
Both are inter process communication (IPC) mechanisms.
UNIX domain sockets are uses for communication between processes on one host similar as TCP-Sockets are used between different hosts.
Shared memory (SHM) is a piece of memory where you can put data and share this between processes.
SHM provides you random access by using pointers, Sockets can be written or read but you cannot rewind or do positioning.
#Kornel Kisielewicz 's answer is good IMO. Just adding my own results here for sockets, not only Unix domain sockets.
Shared Memory
Performance is very high. No copies with RAW access data. Fastest access for sure.
Synchronization needed. Design not so easy to setup for complex cases.
Fixed size. Growing shared memory is doable but memory has to be unmapped first, growed, and then remapped.
Signaling mechanism can be quite slow, see here : Boost.Interprocess notify() performance. Especially if you want to do lots of exchanges between processes. Signaling mechanism not so easy to setup also.
Sockets
Easy to setup.
Can be used on different machines.
No complex synchronisation needed.
Size is not a problem if you use TCP. Simple design with header containing the packet size and then send the data.
Ping/Pong exchange is fast because it can be treated as hardware interruption by the OS.
Performance is average: a few copies of data are made.
High CPU consumption compared to shared memory. Sockets calls are not that cheap if you use them a lot.
In my tests, exchanges of small chunks of data (around 1MByte/second) shows no real advantage for shared memory. I would even say that ping/pong exchanges were faster using TCP (due to simple and efficient signaling mechanism). BUT when exchanging large amount of data (around 200MBytes/second), I had 20% CPU consumption with sockets, compared to 3% CPU using shared memory. So a huge win for shared memory in terms of CPU because read and write socket calls are not cheap.
In this case - sockets are faster. Writing to shared memory is faster then any IPC but writing to a memory mapped file and writing to shared memory are 2 completely different things.
when writing to a memory mapped file you need to "flush" what was written to the shared memory to an actual binded file (not exactly, the flush is being done for you), so you copy your data first to the shared memory, and then you copy it again (flush) to the actual file and that is super duper expansive - more then anything, even more then writing to socket, you are gaining nothing by doing that.

Adaptive bandwidth allocation?

In our File Transfer application the network performance was fair
but we want to get the maximum network performance so one way of achieving through
adaptive bandwidth allocation .So the application will be forced to attain the
available bandwidth.friends!!! if u have any white papers or code for reference
it would be much helpful :)
thanks
krishna
If you just throw it at the TCP session with no control, it will transfer at full speed.
You could also compact the file as you transfer. It will not accelerate the transfer, but will optmize the use of the network, at CPU coast.
If it is not enough, the only [software] way to improve that even more is by using multiple TCP sessions so you will reduce the speed delimitating effects of the latency over the TCP flow control. I beleave 5 concurrent transfers from different offsets of the same file will do the job, faster impossible.
I don't think "adaptive bandwidth allocation" really means anything tangible (considering it's the #2 google hit for that expression!) but I'll try to give an answer that might help you ask a better question.
If an application's network activity can be parallelised (bittorrent is a good example of this) then this is one way of achieving faster network transfers.
In general though, for user space applications the networking conditions are going to be outside the application's control for good reasons. If a userspace application considers it part of its mandate to adjust or affect external operating system-level networking conditions I would consider it malware. QoS for example could be used to prioritise the traffic associated with your application but that is something you might want to suggest and explain in a deployment guide and not try to manage from within your application.

Boost asio async vs blocking reads, udp speed/quality

I have a quick and dirty proof of concept app that I wrote in C# that reads high data rate multicast UDP packets from the network. For various reasons the full implementation will be written in C++ and I am considering using boost asio. The C# version used a thread to receive the data using blocking reads. I had some problems with dropped packets if the computer was heavily loaded (generally with processing those packets in another thread).
What I would like to know is if the async read operations in boost (which use overlapped io in windows) will help ensure that I receive the packets and/or reduce the cpu time needed to receive the packets. The single thread doing blocking reads is pretty straightforward, using the async reads seems like a step up in complexity, but I think it would be worth it if it provided higher performance or dropped fewer packets on a heavily loaded system. Currently the data rate should be no higher than 60Mb/s.
I've written some multicast handling code using boost::asio also. I would say that overall, in my experience there is a lot of added complexity to doing things in asio that may not make it easy for other people you work with to understand the code you end up writing.
That said, presumably the argument in favour of moving to asio instead of using lots of different threads to do the work is that you would have to do less context switching. This would clearly be true on a single-core box, but what about when you go multi-core? Are you planning to offload the work you receive to threads or just have a single thread doing the processing work? If you go for a single threaded approach you are going to end up in a situation where you could drop packets waiting for that thread to process the work.
In the end it's swings and roundabouts. I'd say you want to get some fairly solid figures backing up your arguments for going down this route if you are going to do so, just because of all the added complexity it entails (a whole new paradigm for some people I'm sure).

Resources