Page 32 of the book "The UNIX Programming Environment" makes this profound statement about UNIX pipes:
The programs in a pipeline actually run at the same time, not one
after another. This means that the programs in a pipeline can be
interactive; the kernel looks after whatever scheduling and
synchronization is need to make it all work.
Wow! By using pipes I can get parallel processing for free!
"I've got to illustrate this awesome capability to my colleagues" I thought. I will implement a demo: create a simple interactive tool that mimics what I type in at the command window. I'll name that tool mimic. Use a pipe to connect it to another tool which counts the number of lines and characters. I'll name that tool wc (sadly, I am working on Windows, which doesn't have the UNIX wc program so I must implement my own). The two tools will run on a command line like this:
mimic | wc
The output of mimic is piped into wc.
Well, that's the idea.
I implemented the mimic tool with a very simple Flex lexer, which I show below. Compiling it generated mimic.exe
When I run mimic.exe from the command line it does indeed mimic what I type:
> mimic
hello world
hello world
greetings
greetings
ctrl-c
I implemented wc using AWK. The AWK program (wc.awk) is shown below. When I run wc from the command line it does indeed count the lines and characters:
> echo Hello World | awk -f wc.awk
lines 1 chars 13
However, when I put them together with a pipe, they don't work as I imagined they would. Here's a sample run:
> mimic | awk -f wc.awk
hello world
greetings
ctrl-c
Hmm, nothing. No mimicking. No line counts. No char counts.
How come it's not working? What am I doing wrong, please?
What can I do to make it work as I expected? I expected it to work like this: I type something in at the command line. mimic repeats it, sending it to the pipe which sends it to wc which reports the number of lines and characters. I type in the next thing at the command line. mimic repeats it, sending it to the pipe which sends it to wc which reports the number of lines and characters. And so forth. That's the behavior I thought I was implementing.
Here is my simple Flex lexer (mimic):
%option noyywrap
%option always-interactive
%%
%%
int main(int argc, char *argv[])
{
yyin = stdin;
yylex();
return 0;
}
Here is my simple AWK program (wc):
{ nchars = nchars + length($0) + 1 }
END { printf("lines %-10d chars %d\n", NR, nchars) }
This question has little or nothing to do with Flex, Bison, Awk, and really not so much about Unix, either (since you're experimenting with Windows).
I don't have Windows handy, but the underlying issue is basically about stdio buffering, so it's reproducible on Unix as well.
To simplify, I only implemented mimic, which I did directly rather than using Flex (which is clearly overkill):
#include <stdio.h>
int main(void) {
for (int ch; (ch = getchar()) != EOF; ) putchar(ch);
return 0;
}
Since you use %always-interactive, which forces Flex to always read one character at a time with fgetc(), that has basically the same sequence of standard library calls as your program, except for simplifying fwrite of one byte to the equivalent putchar.
It certainly has the same execution characteristic:
$ ./mimic
Here we go round the mulberry busy,
Here we go round the mulberry busy,
the mulberry bush, the mulberry bush.
the mulberry bush, the mulberry bush.
In the above, I signalled end-of-input by typing Ctrl-D for the third line. On Windows, I would have had to have typed Ctrl-Z followed by Enter to get the same effect. If I kill the execution by typing Ctrl-C instead, I get roughly the same result (other than the fact that Ctrl-C shows up in the console):
$ ./mimic
Here we go round the mulberry bush,
Here we go round the mulberry bush,
the mulberry bush, the mulberry bush.
the mulberry bush, the mulberry bush.
^C
Now, since mimic just copies stdin to stdout, I might expect to be able to pipe it into itself and get the same result. But the output is a little different:
$ ./mimic | ./mimic
Here we go round the mulberry bush,
the mulberry bush, the mulberry bush.
Here we go round the mulberry bush,
the mulberry bush, the mulberry bush.
Again, I signaled end of input by typing Ctrl-D. And it was only after I typed the Ctrl-D that any output appeared; at that point, both lines were echoed. And if I terminate the program abruptly with Ctrl-C, I don't get any output at all:
$ ./mimic | ./mimic
Here we go round the mulberry bush,
the mulberry bush, the mulberry bush.
^C
OK, that's the data. Now, we need an explanation. And the explanation has to do with the way C standard library streams are buffered, which you can read about in the setvbuf manpage (and in many places on the web). I'll summarise:
The C standard specifies that all input functions execute "as if" implemented by repeated calls to fgetc, and all output functions "as if" implemented by repeated calls to fputc. Note that this means that there is nothing special about the chunk of bytes written by a single printf or fwrite. The library does not do anything to ensure that the sequence of calls is atomic. If you have two processes both writing to stdout, the messages can get interleaved, and that will happen from time to time.
The data written by fputc (and, consequently, all stdio output functions) is actually placed into an output buffer. This buffer is not part of the Operating System (which may add another layer of buffering). It's strictly part of the stdio library functions, which are ordinary userland functions. You could write them yourself (and it's a pretty good exercise to do so). From time to time, the contents of this buffer are sent to the appropriate operating system interface in order to be transferred to the output device.
"From time to time" is deliberately unspecific. There are three standard buffering modes, although the standard doesn't require them all to be used by a particular library implementation, and it also doesn't restrict the library implementation from using different buffering modes (although the three specified modes do basically cover the useful possibilities). However, most C library implementations you're likely to use do implement all three, pretty well as described in the standard. So take the following as a description of a common implementation technique, but be aware that on certain idiosyncratic platforms, it might not be accurate.
The three buffering modes are:
Unbuffered. In this mode, each byte written is transferred to the operating system (and, it is hoped, to the actual output device) as soon as possible.
Fully buffered. In this mode, there is a buffer of a predetermined size (often 8 kilobytes, but different library implementations have different defaults for different platforms). If you want to, you can supply your own buffer (of an arbitrary size) for a particular output stream, using the setvbuf standard library function (q.v.). Fully-buffered output might stay in the output buffer until it is full (although a given implementation may release output earlier). The buffer will, however, be sent to the operating system if you call fflush or fclose on the stream, or if fclose is called automatically when main returns. (It's not sent if the process dies, though.)
Line-buffered. In this mode, the stream again has an output buffer of a predetermined size, which is usually exactly the same as the buffer used in "fully-buffered" mode. The difference is that the buffer is also sent to the operating system when a new line character ('\n') is written. (If the buffer gets full before an end-of-line character is written, then it is sent to the operating system, just as in Fully-Buffered mode. But most of the time, lines will be fairly short, so the buffer will be sent to the OS at the end of each line.)
Finally, you need to know that stdout is fully-buffered by default unless the standard library can determine that stdout is connected to some kind of console device, in which case it is not fully-buffered. (On Unix, it is typically line-buffered.) By contrast, stderr is not fully-buffered. (On Unix, it is typically unbuffered.) You can change the buffering mode, and the buffer size if relevant, calling setvbuf before the first write operation to the stream.
The above-mentioned defaults are one of the reasons you are encouraged to write error messages to stderr rather than stdout: since stderr is unbuffered, the error message will appear as soon as possible. Also, you should normally put \n at the end of output line; if standard output is a terminal and therefore line buffered, the \n will ensure that the line is actually output, rather than languishing in the output buffer.
You can see all of this in action in the above examples. When I just ran ./mimic, leaving stdout mapped to the terminal, the output showed up each time I entered a line. (That also has to do with the way terminal input is handled by the terminal driver, which is another kettle of fish.)
But when I piped mimic into itself, the first mimics standard output is redirected to a pipe. A pipe is not a terminal, so that mimic's stdout is fully-buffered by default. Since the buffer is longer than the total input, the entire program runs without sending anything to stdout, until the buffer is flushed when stdout is implicitly closed by main returning.
Moreover, if I kill the process (by typing Ctrl-C, for example, or by sending it a SIGKILL signal), then the output buffer is never sent to the operating system, and nothing appears on the console.
If you're writing console apps using standard C library calls, it's very important to understand how stdio output buffering affects the sequence of outputs you see. That's just as true on Windows as on Unix. (Of course, it doesn't apply if you use native Windows or Posix I/O interfaces.)
A semi-newbie to UNIX piping, so apologies if I'm asking anything obvious here. I'm using a program called CCExtractor to grab the closed captions from a video file. It has the option to receive a file from stdin, and it works great if I do the following:
./ccextractor -stdin < myvideofile.wtv
However, I want to try using it "live" - as a video is recording, it'll transcribe the subtitles. From my understanding, < won't do that, as it'll stop as soon as it reaches the current end of the file. Following this answer on Stack Overflow, it seems like:
tail -c +1 -f myvideofile.wtv | ./ccextractor -stdin
should work - but it doesn't process any part of the video at all (it should, at least, work as well as the previous command and parse the existing data). I figured I'd take a step back and use a simple cat:
cat myvideofile.wtv | ./ccextractor -stdin
and that doesn't work either. I was of the belief that the first and third commands ought to be roughly equivalent, but that's obviously not the case. What are the differences, and how could I get this to work?
I'm trying to use libxively to update my feed, but it frequently seems to do nothing. I've got a basic call:
{
xi_datastream_t& ds = mXIFeed.datastreams[2];
::xi_str_copy_untiln(ds.datastream_id, sizeof (ds.datastream_id), "cc-output-power", '\0');
xi_datapoint_t& dp = ds.datapoints[0];
ds.datapoint_count = 1;
::xi_set_value_f32(&dp, mChargeController->outputPower());
}
const xi_context_t* ctx = ::xi_nob_feed_update(mXIContext, &mXIFeed);
it logs the following:
[io/posix/posix_io_layer.c:182 (posix_io_layer_init)] [posix_io_layer_init]
[io/posix/posix_io_layer.c:191 (posix_io_layer_init)] Creating socket...
[io/posix/posix_io_layer.c:202 (posix_io_layer_init)] Socket creation [ok]
Once or twice I saw my Xively developer page show a GET feed, but otherwise, nothing seems to get written. Any suggestions on what I should look at?
I tried to rebuild the library using blocking calls (would be nice if nob didn't mean no blocking calls), but I couldn't figure out how to build it.
Thanks!
EDIT:
I was able to build a synchronous version of the library, and that seems to work. Can anyone verify that the async version works? Is there more to it than simply calling xi_nob_feed_update()?
EDIT 2:
I tried running the async example, but I'm doing something wrong, as it always complains of no data received:
$ bin/asynch_feed_update <my key> <my feed ID> example 1 example 4 example 20 example 58 example 11 example 17
example: 1 7
example: 4 7
example: 20 7
example: 58 7
example: 11 7
example: 17 7
[io/posix_asynch/posix_asynch_io_layer.c:165 (posix_asynch_io_layer_init)] [posix_io_layer_init]
[io/posix_asynch/posix_asynch_io_layer.c:174 (posix_asynch_io_layer_init)] Creating socket...
[io/posix_asynch/posix_asynch_io_layer.c:185 (posix_asynch_io_layer_init)] Setting socket non blocking behaviour...
[io/posix_asynch/posix_asynch_io_layer.c:203 (posix_asynch_io_layer_init)] Socket creation [ok]
No data within five seconds.
The asynchronous version should work. The xi_nob_feed_update() is the right function to make a feed update request.
You have to call process_xively_nob_step() in a loop just after select().
In general, you should follow the asynchronous example.
Is it possible to graph the query resolution time of bind9 in munin?
I know there is a way to graph it in a unbound server, is it already done in bind? If not how do I start writing a munin plugin for that? I'm getting stats from http://127.0.0.1:8053/ in the bind9 server.
I don't believe that "query time" is a function of BIND. About the only time that I see that value (with individual lookups) is when using dig. If you're willing to use that, the following might be a good starting point:
#!/bin/sh
case $1 in
config)
cat <<'EOM'
graph_title Red Hat Query Time
graph_vlabel time
time.label msec
EOM
exit 0;;
esac
echo -n "time.value "
dig www.redhat.com|grep Query|cut -d':' -f2|cut -d\ -f2
Note that there's two spaces after the "-d\" in the second cut statement. If you save the above as "querytime" and run it at the command line, output should look something like:
root#pi1:~# ./querytime
time.value 189
root#pi1:~# ./querytime config
graph_title Red Hat Query Time
graph_vlabel time
time.label msec
I'm not sure of the value in tracking the above though. The response time can be affected: if the query is an initial lookup, if the answer is cached locally, depending on server load, depending on intervening network congestion, etc.
Note: the above may be a bit buggy as I've written it on the fly, but it should give you a good starting point. That it returned the above output is a good sign.
In any case, recommend reading the following before you write your own: http://munin-monitoring.org/wiki/HowToWritePlugins
I am writing a program with a named pipe with multiple readers and multiple writers. The idea is to use that named pipe to create pairs of reader/writer. That is:
A reads the pipe
B writes in the pipe
(vice versa)
Pair A-B created!
In order to ensure that only one process is reading and one is writing, I have used 2 locks with flock. Just like this.
Reader Code:
echo "[JOB $2, Part $REMAINING] Taking next machine..."
VMTAKEN=$((
flock -x 200;
cat $VMPIPE;
)200>$JOINQUEUELOCK)
echo "[JOB $2, Part $REMAINING] Machine $VMTAKEN taken..."
Writer Code:
((
flock -x 200;
echo "[MACHINE $MACHINEID] I am inside the critical section"
echo "$MACHINEID" > $VMPIPE;
echo "[MACHINE $MACHINEID] Going outside the critical section"
)200>$VMQUEUELOCK)
echo "[MACHINE $MACHINEID] Got new Job"
I sometimes get the following problem:
[MACHINE 3] I am inside the critical section
[JOB 1, Part 249] Taking next machine...
[MACHINE 3] Going outside the critical section
[MACHINE 1] I am inside the critical section
[MACHINE 1] Going outside the critical section
[MACHINE 1]: Got new Job
[MACHINE 3]: Got new Job
[JOB 1, Part 249] Machine 3
1 taken...
As you can see, Another writer wrote before the reader finished reading. What can I do to get rid of this problem? Should I use an ACK Pipe or something?
Thank you in advance
This would be a typical use for semaphores:
Create 2 semaphores - one for reading processed, the other one for writing processes. set each semaphore to value 1
Reading processes sem_wait(2) on the semaphore for readers until semphore > 0 and lower it to zero if they get it.
Writing processes will do the same with the semaphore intended for them
A controlling process (which may also set up the semaphores initially) could check, if both semaphores are zero and assign the pair
reader/writer release the semaphores (increasing them by 1 again) so next readr or writer will get the semaphore.
For passing informations between reader/writer shared memory may be used...