Piping as interprocess communication - unix

I am interested in writing separate program modules that run as independent threads that I could hook together with pipes. The motivation would be that I could write and test each module completely independently, perhaps even write them in different languages, or run the different modules on different machines. There are a wide variety of possibilities here. I have used piping for a while, but I am unfamiliar with the nuances of its behaviour.
It seems like the receiving end will block waiting for input, which I would expect, but will the sending end block sometimes waiting for someone to read from the stream?
If I write an eof to the stream can I keep continue writing to that stream until I close it?
Are there differences in the behaviour named and unnamed pipes?
Does it matter which end of the pipe I open first with named pipes?
Is the behaviour of pipes consistent between different Linux systems?
Does the behaviour of the pipes depend on the shell I'm using or the way I've configured it?
Are there any other questions I should be asking or issues I should be aware of if I want to use pipes in this way?

Wow, that's a lot of questions. Let's see if I can cover everything...
It seems like the receiving end will
block waiting for input, which I would
expect
You expect correctly an actual 'read' call will block until something is there. However, I believe there are some C functions that will allow you to 'peek' at what (and how much) is waiting in the pipe. Unfortunately, I don't remember if this blocks as well.
will the sending end block sometimes
waiting for someone to read from the
stream
No, sending should never block. Think of the ramifications if this were a pipe across the network to another computer. Would you want to wait (through possibly high latency) for the other computer to respond that it received it? Now this is a different case if the reader handle of the destination has been closed. In this case, you should have some error checking to handle that.
If I write an eof to the stream can I
keep continue writing to that stream
until I close it
I would think this depends on what language you're using and its implementation of pipes. In C, I'd say no. In a linux shell, I'd say yes. Someone else with more experience would have to answer that.
Are there differences in the behaviour
named and unnamed pipes?
As far as I know, yes. However, I don't have much experience with named vs unnamed. I believe the difference is:
Single direction vs Bidirectional communication
Reading AND writing to the "in" and "out" streams of a thread
Does it matter which end of the pipe I
open first with named pipes?
Generally no, but you could run into problems on initialization trying to create and link the threads with each other. You'd need to have one main thread that creates all the sub-threads and syncs their respective pipes with each other.
Is the behaviour of pipes consistent
between different linux systems?
Again, this depends on what language, but generally yes. Ever heard of POSIX? That's the standard (at least for linux, Windows does it's own thing).
Does the behaviour of the pipes depend
on the shell I'm using or the way I've
configured it?
This is getting into a little more of a gray area. The answer should be no since the shell should essentially be making system calls. However, everything up until that point is up for grabs.
Are there any other questions I should
be asking
The questions you've asked shows that you have a decent understanding of the system. Keep researching and focus on what level you're going to be working on (shell, C, so on). You'll learn a lot more by just trying it though.

This is all based on a UNIX-like system; I'm not familiar with the specific behavior of recent versions of Windows.
It seems like the receiving end will block waiting for input, which I would expect, but will the sending end block sometimes waiting for someone to read from the stream?
Yes, although on a modern machine it may not happen often. The pipe has an intermediate buffer that can potentially fill up. If it does, the write side of the pipe will indeed block. But if you think about it, there aren't a lot of files that are big enough to risk this.
If I write an eof to the stream can I keep continue writing to that stream until I close it?
Um, you mean like a CTRL-D, 0x04? Sure, as long as the stream is set up that way. Viz.
506 # cat | od -c
abc
^D
efg
0000000 a b c \n 004 \n e f g \n
0000012
Are there differences in the behaviour named and unnamed pipes?
Yes, but they're subtle and implementation dependent. The biggest one is that you can write to a named pipe before the other end is running; with unnamed pipes, the file descriptors get shared during the fork/exec process, so there's no way to access the transient buffer without the processes being up.
Does it matter which end of the pipe I open first with named pipes?
Nope.
Is the behaviour of pipes consistent between different linux systems?
Within reason, yes. Buffer sizes etc may vary.
Does the behaviour of the pipes depend on the shell I'm using or the way I've configured it?
No. When you create a pipe, under the covers what happens is your parent process (the shell) creates a pipe which has a pair of file descriptors, then does a fork exec like this pseudocode:
Parent:
create pipe, returning two file descriptors, call them fd[0] and fd[1]
fork write-side process
fork read-side process
Write-side:
close fd[0]
connect fd[1] to stdout
exec writer program
Read-side:
close fd[1]
connect fd[0] to stdin
exec reader program
Are there any other questions I should be asking or issues I should be aware of if I want to use pipes in this way?
Is everything you want to do really going to lay out in a line like this? If not, you might want to think about a more general architecture. But the insight that having lots of separate processes interacting through the "narrow" interface of a pipe is desirable is a good one.
[Updated: I had the file descriptor indices reversed at first. They're correct now, see man 2 pipe.]

As Dashogun and Charlie Martin noted, this is a big question. Some parts of their answers are inaccurate, so I'm going to answer too.
I am interested in writing separate program modules that run as independent threads that I could hook together with pipes.
Be wary of trying to use pipes as a communication mechanism between threads of a single process. Because you would have both read and write ends of the pipe open in a single process, you would never get the EOF (zero bytes) indication.
If you were really referring to processes, then this is the basis of the classic Unix approach to building tools. Many of the standard Unix programs are filters that read from standard input, transform it somehow, and write the result to standard output. For example, tr, sort, grep, and cat are all filters, to name but a few. This is an excellent paradigm to follow when the data you are manipulating permits it. Not all data manipulations are conducive to this approach, but there are many that are.
The motivation would be that I could write and test each module completely independently, perhaps even write them in different languages, or run the different modules on different machines.
Good points. Be aware that there isn't really a pipe mechanism between machines, though you can get close to it with programs such as rsh or (better) ssh. However, internally, such programs may read local data from pipes and send that data to remote machines, but they communicate between machines over sockets, not using pipes.
There are a wide variety of possibilities here. I have used piping for a while, but I am unfamiliar with the nuances of its behaviour.
OK; asking questions is one (good) way to learn. Experimenting is another, of course.
It seems like the receiving end will block waiting for input, which I would expect, but will the sending end block sometimes waiting for someone to read from the stream?
Yes. There is a limit to the size of a pipe buffer. Classically, this was quite small - 4096 or 5120 were common values. You may find that modern Linux uses a larger value. You can use fpathconf() and _PC_PIPE_BUF to find out the size of a pipe buffer. POSIX only requires the buffer to be 512 (that is, _POSIX_PIPE_BUF is 512).
If I write an eof to the stream can I keep continue writing to that stream until I close it?
Technically, there is no way to write EOF to a stream; you close the pipe descriptor to indicate EOF. If you are thinking of control-D or control-Z as an EOF character, then those are just regular characters as far as pipes are concerned - they only have an effect like EOF when typed at a terminal that is running in canonical mode (cooked, or normal).
Are there differences in the behaviour named and unnamed pipes?
Yes, and no. The biggest differences are that unnamed pipes must be set up by one process and can only be used by that process and children who share that process as a common ancestor. By contrast, named pipes can be used by previously unassociated processes. The next big difference is a consequence of the first; with an unnamed pipe, you get back two file descriptors from a single function (system) call to pipe(), but you open a FIFO or named pipe using the regular open() function. (Someone must create a FIFO with the mkfifo() call before you can open it; unnamed pipes do not need any such prior setup.) However, once you have a file descriptor open, there is precious little difference between a named pipe and an unnamed pipe.
Does it matter which end of the pipe I open first with named pipes?
No. The first process to open the FIFO will (normally) block until there's a process with the other end open. If you open it for reading and writing (aconventional but possible) then you won't be blocked; if you use the O_NONBLOCK flag, you won't be blocked.
Is the behaviour of pipes consistent between different Linux systems?
Yes. I've not heard of or experienced any problems with pipes on any of the systems where I've used them.
Does the behaviour of the pipes depend on the shell I'm using or the way I've configured it?
No: pipes and FIFOs are independent of the shell you use.
Are there any other questions I should be asking or issues I should be aware of if I want to use pipes in this way?
Just remember that you must close the reading end of a pipe in the process that will be writing, and the writing end of the pipe in the process that will be reading. If you want bidirectional communication over pipes, use two separate pipes. If you create complicated plumbing arrangements, beware of deadlock - it is possible. A linear pipeline does not deadlock, however (though if the first process never closes its output, the downstream processes may wait indefinitely).
I observed both above and in comments to other answers that pipe buffers are classically limited to quite small sizes. #Charlie Martin counter-commented that some versions of Unix have dynamic pipe buffers and these can be quite large.
I'm not sure which ones he has in mind. I used the test program that follows on Solaris, AIX, HP-UX, MacOS X, Linux and Cygwin / Windows XP (results below):
#include <unistd.h>
#include <signal.h>
#include <stdio.h>
#include <fcntl.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
static const char *arg0;
static void err_syserr(char *str)
{
int errnum = errno;
fprintf(stderr, "%s: %s - (%d) %s\n", arg0, str, errnum, strerror(errnum));
exit(1);
}
int main(int argc, char **argv)
{
int pd[2];
pid_t kid;
size_t i = 0;
char buffer[2] = "a";
int flags;
arg0 = argv[0];
if (pipe(pd) != 0)
err_syserr("pipe() failed");
if ((kid = fork()) < 0)
err_syserr("fork() failed");
else if (kid == 0)
{
close(pd[1]);
pause();
}
/* else */
close(pd[0]);
if (fcntl(pd[1], F_GETFL, &flags) == -1)
err_syserr("fcntl(F_GETFL) failed");
flags |= O_NONBLOCK;
if (fcntl(pd[1], F_SETFL, &flags) == -1)
err_syserr("fcntl(F_SETFL) failed");
while (write(pd[1], buffer, sizeof(buffer)-1) == sizeof(buffer)-1)
{
putchar('.');
if (++i % 50 == 0)
printf("%u\n", (unsigned)i);
}
if (i % 50 != 0)
printf("%u\n", (unsigned)i);
kill(kid, SIGINT);
return 0;
}
I'd be curious to get extra results from other platforms. Here are the sizes I found. All the results are larger than I expected, I must confess, but Charlie and I may be debating the meaning of 'quite large' when it comes to buffer sizes.
8196 - HP-UX 11.23 for IA-64 (fcntl(F_SETFL) failed)
16384 - Solaris 10
16384 - MacOS X 10.5 (O_NONBLOCK did not work, though fcntl(F_SETFL) did not fail)
32768 - AIX 5.3
65536 - Cygwin / Windows XP (O_NONBLOCK did not work, though fcntl(F_SETFL) did not fail)
65536 - SuSE Linux 10 (and CentOS) (fcntl(F_SETFL) failed)
One point that is clear from these tests is that O_NONBLOCK works with pipes on some platforms and not on others.
The program creates a pipe, and forks. The child closes the write end of the pipe, and then goes to sleep until it gets a signal - that's what pause() does. The parent then closes the read end of the pipe, and sets the flags on the write descriptor so that it won't block on an attempt to write on a full pipe. It then loops, writing one character at a time, and printing a dot for each character written, and a count and newline every 50 characters. When it detects a write problem (buffer full, since the child is not reading a thing), it stops the loop, writes the final count, and kills the child.

Related

unix fifo concurrent read from a named pipe leaves one of processes unterminated

I have created a fifo named pipe in solaris, which writes the content of a file, line by line, into pipe as below:
$ mkfifo namepipe
$ cat books.txt
"how to write unix code"
"how to write oracle code"
$ cat books.txt >> namepipe &
I have a readpipe.sh script which reads the named pipe in parallel like this:
# readpipe.sh
while IFS=',' read var
do
echo var >> log.txt
done < namepipe
I am calling the readpipe.sh like
readpipe.sh &
sleep 2
readpipe.sh &
I have introduced a sleep 2 to avoid a race condition to occur, i.e. two processes get parts of values from each line, like process 1 gets
"how to"
and process 2 gets
"write unix code"
The problem I am facing is when the all the contents of namepipe is completed, the first background process gets exited, while the second keeps on running without completing.
The logic in script is declared simple in here for clear understanding. Actual readpipd.sh does lots of activities.
Kindly help me with knowledge
I have introduced a sleep 2 to avoid a race condition to occur
First, that's not going to work. sleep does not explicitly synchronize anything.
Second, you can't "share" reading from a pipe unless each reading process knows how much to try to read at a time. And read in a sh script isn't going to give you any control over how many bytes the sh binary actually reads from the pipe with the low-level read() call it uses. The sh process likely tries to read PIPE_BUF bytes, but it can try to read anything since it's really just feeding a stream. And you have no control over how much it reads.
For example, each process needs to know that the next read from the pipe needs to be 143 bytes, and then it issues a low-level read( fd, buffer, 143 ); call to read only 143 bytes.
And even if you can control at the lowest level how much each process reads, each process needs to know how much to read, which in this case you can't know.
Having a known-in-advance read size is necessary to do what you want, and you don't have that when using scripts. Note, though, it may not be sufficient - even if you solve the shared-reader problem, you may run into other issues trying share readers.
What's almost certainly happening in this case is your first script to run consumes all the data passed into the pipe, then closes and continues when there's no more data. THEN the second script opens the pipe and waits for data - but there isn't any, so it just blocks.

printf messages show up after about 10 of them are sent [duplicate]

Why does printf not flush after the call unless a newline is in the format string? Is this POSIX behavior? How might I have printf immediately flush every time?
The stdout stream is line buffered by default, so will only display what's in the buffer after it reaches a newline (or when it's told to). You have a few options to print immediately:
Print to stderrinstead using fprintf (stderr is unbuffered by default):
fprintf(stderr, "I will be printed immediately");
Flush stdout whenever you need it to using fflush:
printf("Buffered, will be flushed");
fflush(stdout); // Will now print everything in the stdout buffer
Disable buffering on stdout by using setbuf:
setbuf(stdout, NULL);
Or use the more flexible setvbuf:
setvbuf(stdout, NULL, _IONBF, 0);
No, it's not POSIX behaviour, it's ISO behaviour (well, it is POSIX behaviour but only insofar as they conform to ISO).
Standard output is line buffered if it can be detected to refer to an interactive device, otherwise it's fully buffered. So there are situations where printf won't flush, even if it gets a newline to send out, such as:
myprog >myfile.txt
This makes sense for efficiency since, if you're interacting with a user, they probably want to see every line. If you're sending the output to a file, it's most likely that there's not a user at the other end (though not impossible, they could be tailing the file). Now you could argue that the user wants to see every character but there are two problems with that.
The first is that it's not very efficient. The second is that the original ANSI C mandate was to primarily codify existing behaviour, rather than invent new behaviour, and those design decisions were made long before ANSI started the process. Even ISO nowadays treads very carefully when changing existing rules in the standards.
As to how to deal with that, if you fflush (stdout) after every output call that you want to see immediately, that will solve the problem.
Alternatively, you can use setvbuf before operating on stdout, to set it to unbuffered and you won't have to worry about adding all those fflush lines to your code:
setvbuf (stdout, NULL, _IONBF, BUFSIZ);
Just keep in mind that may affect performance quite a bit if you are sending the output to a file. Also keep in mind that support for this is implementation-defined, not guaranteed by the standard.
ISO C99 section 7.19.3/3 is the relevant bit:
When a stream is unbuffered, characters are intended to appear from the source or at the destination as soon as possible. Otherwise characters may be accumulated and transmitted to or from the host environment as a block.
When a stream is fully buffered, characters are intended to be transmitted to or from the host environment as a block when a buffer is filled.
When a stream is line buffered, characters are intended to be transmitted to or from the host environment as a block when a new-line character is encountered.
Furthermore, characters are intended to be transmitted as a block to the host environment when a buffer is filled, when input is requested on an unbuffered stream, or when input is requested on a line buffered stream that requires the transmission of characters from the host environment.
Support for these characteristics is implementation-defined, and may be affected via the setbuf and setvbuf functions.
It's probably like that because of efficiency and because if you have multiple programs writing to a single TTY, this way you don't get characters on a line interlaced. So if program A and B are outputting, you'll usually get:
program A output
program B output
program B output
program A output
program B output
This stinks, but it's better than
proprogrgraam m AB ououtputputt
prproogrgram amB A ououtputtput
program B output
Note that it isn't even guaranteed to flush on a newline, so you should flush explicitly if flushing matters to you.
To immediately flush call fflush(stdout) or fflush(NULL) (NULL means flush everything).
stdout is buffered, so will only output after a newline is printed.
To get immediate output, either:
Print to stderr.
Make stdout unbuffered.
Note: Microsoft runtime libraries do not support line buffering, so printf("will print immediately to terminal"):
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setvbuf
by default, stdout is line buffered, stderr is none buffered and file is completely buffered.
You can fprintf to stderr, which is unbuffered, instead. Or you can flush stdout when you want to. Or you can set stdout to unbuffered.
Use setbuf(stdout, NULL); to disable buffering.
There are generally 2 levels of buffering-
1. Kernel buffer Cache (makes read/write faster)
2. Buffering in I/O library (reduces no. of system calls)
Let's take example of fprintf and write().
When you call fprintf(), it doesn't wirte directly to the file. It first goes to stdio buffer in the program's memory. From there it is written to the kernel buffer cache by using write system call. So one way to skip I/O buffer is directly using write(). Other ways are by using setbuff(stream,NULL). This sets the buffering mode to no buffering and data is directly written to kernel buffer.
To forcefully make the data to be shifted to kernel buffer, we can use "\n", which in case of default buffering mode of 'line buffering', will flush I/O buffer.
Or we can use fflush(FILE *stream).
Now we are in kernel buffer. Kernel(/OS) wants to minimise disk access time and hence it reads/writes only blocks of disk. So when a read() is issued, which is a system call and can be invoked directly or through fscanf(), kernel reads the disk block from disk and stores it in a buffer. After that data is copied from here to user space.
Similarly that fprintf() data recieved from I/O buffer is written to the disk by the kernel. This makes read() write() faster.
Now to force the kernel to initiate a write(), after which data transfer is controlled by hardware controllers, there are also some ways. We can use O_SYNC or similar flags during write calls. Or we could use other functions like fsync(),fdatasync(),sync() to make the kernel initiate writes as soon as data is available in the kernel buffer.

Is it possible to write with several processors in the same file, at the end of the file, in an ordonated way?

I have 2 processors (this is an example), and I want these 2 processors to write in a file. I want them to write at the end of file, but not in a mixed pattern, like that :
[file content]
proc0
proc1
proc0
proc1
proc0
proc1
(and so on..)
I'd like to make them write following this kind of pattern :
[file content]
proc0
proc0
proc0
proc1
proc1
proc1
(and so on..)
Is it possible? If so, what's the setting to use?
The sequence in which your processes have outputs ready to report is, essentially, unknowable in advance. Even repeated runs of exactly the same MPI program will show differences in the ordering of outputs. So something, somewhere, is going to have to impose an ordering on the writes to the file.
A very common pattern, the one Wesley has already mentioned, is to have all processes send their outputs to one process, often process 0, and let it deal with the writing to file. This master-writer could sort the outputs before writing but this creates a couple of problems: allocating space to store output before writing it and, more difficult to deal with, determining when a collection of output records can be sorted and written to file and the output buffers be reused. How long does the master-writer wait and how does it know if a process is still working ?
So it's common to have the master-writer write outputs as it gets them and for another program to order the output file as desired after the parallel program has finished. You could tack this on to your parallel program as a step after mpi_finalize or you could use a completely separate program (such as sort on a Linux machine). Of course, for this to work each output record has to contain some sequencing information on which to sort.
Another common pattern is to only have one process which does any writing at all, that is, none of the other processes do any output at all. This completely avoids the non-determinism of the sequencing of the writing.
Another pattern, less common partly because it is more difficult to implement and partly because it depends on underlying mechanisms which are not always available, is to use mpi io. With mpi io multiple processes can write to different parts of a file as if simultaneously. To actually write simultaneously the program needs to be executing on hardware, network and operating system which supports parallel i/o. It can be tricky to implement this even with the right platform, and especially when the volume of output from processes is uncertain.
In my experience here on SO people asking question such as yours are probably at too early a stage in their MPI experience to be tackling parallel i/o, even if they have access to the necessary hardware.
I disagree with High Performance Mark. MPI-IO isn't so tricky in 2014 (as long as you have have access to any file system besides NFS -- install PVFS if you need a cheap easy parallel file system).
If you know how much data each process has, you can use MPI_SCAN to efficiently compute how much data was written by "earlier" processes, then use MPI_FILE_WRITE_AT_ALL to carry out the I/O efficiently. Here's one way you might do this:
incr = (count*datatype_size);
MPI_Scan(&incr, &new_offset, 1, MPI_LONG_LONG_INT,
MPI_SUM, MPI_COMM_WORLD);
MPI_File_write_at_all(mpi_fh, new_offset, buf, count,
datatype, status)
The answer to your question is no. If you do things that way, you'll end up with jumbled output from all over the place.
However, you can get the same thing by sending your output to a single processor having it do all of the writing itself. For example, at the end of your application, just have everything send to rank 0 and have rank 0 write it all to a file.

Why does System V shared memory have separate get and attach functions?

Using System V shared memory IPC requires calls to the following two functions:
int shmget(key_t key, size_t size, int shmflg);
void *shmat(int shmid, const void *shmaddr, int shmflg);
Why are they designed to be separate, instead of having a single function that accepts these arguments, performs both functions and simply returns the address?
We can consider files as an analogy. open on a string (the file path) gives us a file descriptor, and we use that to read/write from the file. We close on the file descriptor when we're done. This design seems natural, we don't have to open with a string to get a descriptor, and then attach to the descriptor.
As an example of what I have in mind, take a look at the FreeBSD sendmail shared memory implementation.
This kind of separation (shm_open and mmap) also exists with POSIX shared memory, but the reason was that mmap existed before shm_open was implemented and could be reused, and mmap requires a descriptor (source: UNIX Network Programming Vol. 2, R. Stevens, chapter 13, page 326).
Shared memory is probably one of the fastest ways of allowing for IPC as data need not be copied, the problem associated with it though is synchronizing access between multiple threads. You could do this using semaphores or record locks , we end up using the later in unix fro shared memory even though they are not as efficient as they are simple, the system cleans up well, and you don't need some of the bling that semaphores bring along.
Lets look into how these work to understand why they are implemented as such.
In comes the shmid_ds used by the linux kernel (http://www.tldp.org/LDP/lpg/node68.html)
the shm_nattch is the unsigned int counter for current attaches. shmget gets you an shm id and sets stuff like the ipc_perm , dates, pid, atime ctime, request of the segment size (shm_segsz)
next the shmctl kicks in and does stuff for ipc using IPC_STAT, IPC_RMID, IPC_SET like setting perms, getting or removing shm_id for a segment or even locking or unlocking it.
Once the segment is ready shmat is used by a process to attach to its address space, depending on the flags and address parameters. Once it attaches the kernel increments the shm_nattch. When detaching we call shmdt to detach . Removal of the identifier and the associated data structure is not automated some process has to do this calling shmctl with the IPC_RMID and depending on shm_perm
As you can see this is all very similar to how one would use semaphores and the implementation makes sense.
One possible reason I could think of is this:
(From the manpage of shmget)
After a fork(2) the child inherits the attached shared memory segments.
After an execve(2) all attached shared memory segments are detached from the process.
Upon _exit(2) all attached shared memory segments are detached from the process.
Well, technically attaching and detaching is basic reference counting on the shared memory segment that is reserved during shmget.
The functionalities of allocating the shared memory segment, via shmget and reference counting them (up or down, via shmat and shmdt respectively), are separate so that, code can be reused during fork and exec.
If they were both packed into the same function, you would anyways need a separate function, which just does reference counting (to be invoked during fork/exec). So, I think this design is simply to promote code reuse, and avoid code duplication.

Techniques for infinitely long pipes

There are two really simple ways to let one program send a stream of data to another:
Unix pipe, or TCP socket, or something like that. This requires constant attention by consumer program, or producer program will block. Even increasing buffers their typically tiny defaults, it's still a huge problem.
Plain files - producer program appends with O_APPEND, consumer just reads whatever new data became available at its convenience. This doesn't require any synchronization (as long as diskspace is available), but Unix files only support truncating at the end, not at beginning, so it will fill up disk until both programs quit.
Is there a simple way to have it both ways, with data stored on disk until it gets read, and then freed? Obviously programs could communicate via database server or something like that, and not have this problem, but I'm looking for something that integrates well with normal Unix piping.
A relatively simple hand-rolled solution.
You could have the producer create files and keep writing until it gets to a certain size/number of record, whatever suits your application. The producer then closes the file and starts a new one with an agreed naming algorithm.
The consumer reads new records from a file then when it gets to the agreed maximum size closes and unlinks it and then opens the next one.
If your data can be split into blocks or transactions of some sort, you can use the file method for this with a serial number. The data producer would store the first megabyte of data in outfile.1, the next in outfile.2 etc. The consumer can read the files in order and delete them when read. Thus you get something like your second method, with cleanup along the way.
You should probably wrap all this in a library, so that from the applications point of view this is a pipe of some sort.
You should read some documentation on socat. You can use it to bridge the gap between tcp sockets, fifo files, pipes, stdio and others.
If you're feeling lazy, there's some nice examples of useful commands.
I'm not aware of anything, but it shouldn't be too hard to write a small utility that takes a directory as an argument (or uses $TMPDIR); and, uses select/poll to multiplex between reading from stdin, paging to a series of temporary files, and writing to stdout.

Resources