How to transfer data in unix pipeline in real time? - unix

As i know, if we do for example "./program | grep someoutput", it greps after program process finished. Do you know, how do it by income?

The shell will spawn both processes at the same time and connect the output of the first to the input of the second. If you have two long-running processes then you'll see both in the process table. There's no intermediate step, no storage of data from the first program. It's a pipe, not a bucket.
Note - there may be some buffering involved.

You're wrong.
The pipe recieves data immediately, but of course writing to the pipe by the source process can block if the other end (the sink) is not reading the data out fast enough.
Of course, this doesn't necessarily mean that pipes are suitable for "hard real-time" use, but perhaps that's not what you meant.

Actually, grep can be used "in real time" since data passed through pipes are not buffered, so something like tail -f /var/log/messages | grep something will work.
If you're not getting the expected output, it's more likely that the preceding command is buffering its output. Take for instance the double grep:
tail -f /var/log/messages | grep something | grep another
The output won't appear immediately since grep will buffer its output when stdout is not connected to a terminal which means the second grep won't see any data until the buffer is flushed. Forcing the line-buffering mode solves this issue:
tail -f /var/log/messages | grep --line-buffered something | grep another
Depending on how buffering is done, it may be possible to modify the buffering mode of a command using stdbuf e.g:
stdbuf -oL ./program | grep something

Related

How can I add a newline in between head and tail in Unix?

In the following Unix command (I’m in zsh), I’d like to have a blank line appear between the head and tail of a long text file for readability.
Here’s the command:
cat LongTextFile.txt | tee >(head) >(tail) >/dev/null
I’m already aware of
(head; echo; tail) < LongTextFile.txt
but I’m wondering if it’s possible to use the tee command.
The process substitutions >(head) >(tail) are not sequenced; they run in parallel. head and tail are running concurrently. tee is reading its standard input and distributing it to those two processes. So there is no concept of "between them" where we could insert a newline.
You're just lucky that when the file is long enough, head has a chance to finish before tail starts outputting anything.
If the file is so small that head and tail overlap, you may get interleaved output, or reordered output, depending on the exact buffering going on.
Here you go, this works in Zsh:
print -l "$(head LongTextFile.txt)" '' "$(tail LongTextFile.txt)"
Try:
seq 100 | { s=$(cat); head <<<$s; echo; tail <<<$s; }

Stdout to both pipe and console?

Is there a way to output to both stdout and to the stdin of another process? That is, have the intermediate stdout be output before it reaches the pipe of the other process?
I know of the tee command lets you write to a file and to stdout, but I don't want any files involved here.
This is a little "hacky" I guess, but you can redirect the output of tee to stderr. Since most programs take input from stdin, redirecting to stderr will leave the original output as is while still piping through to the next process.
For example,
cat file.txt | tee >&2 | wc -l
Will output the entire contents of file.txt, and then output just the number of lines (wc -l) in file.txt.
Obviously this will only work if stderr outputs where you want it to (like the terminal/console).
Not an ideal solution since it involves using stderr for something it's not necessarily made for but it works.

unix command 'tail' lost option '--line-buffered'

With the last update of our SuSE Enterprise Linux 11 (now bash 3.2.51(1)-release), the command "tail" seems to have lost its option to stream files:
tail: unrecognized option '--line-buffered'
Our tail is from "GNU coreutils 8.12, March 2013". Is there another, equivalent solution?
As far as can be told by simple googling, tail doesn't appear to have a --line-buffered option, grep does. --line-buffered is useful to force line buffering even when writing to a non-TTY, a typical idiom being:
tail -f FILE | grep --line-buffered REGEXP > output
Here the point of --line-buffered is to prevent grep from buffering output in 8K chunks and forcing the matched lines to immediately appear in the output file.
tail -f is unbuffered regardless of output type, so it doesn't need a --line-buffered option equivalent to the one in grep. This can be verified by running tail -f somefile | cat and appending a line to the file from another shell. One observes that, despite its standard output being a pipe, tail immediately flushes the newly arrived line.

cat file | ... vs ... <file

Is there a case of ... or context where cat file | ... behaves differently than ... <file?
When reading from a regular file, cat is in charge of reading the data, performs it as it pleases, and might constrain it in the way it writes it to the pipeline. Obviously, the contents themselves are preserved, but anything else could be tainted. For example: block size and data arrival timing. Additionally, the pipe in itself isn't always neutral: it serves as an additional buffer between the input and ....
Quick and easy way to make the block size issue apparent:
$ cat large-file | pv >/dev/null
5,44GB 0:00:14 [ 393MB/s] [ <=> ]
$ pv <large-file >/dev/null
5,44GB 0:00:03 [1,72GB/s] [=================================>] 100%
Besides the thing posted by other users, when using input redirection from a file, standard input is the file but when piping the output of cat to the input, standard input is a stream with the contents of the file. When standard input is the file will be able to seek within the file but the pipe will not allow it. You can see this by finding a zip file and running the following commands:
zipinfo /dev/stdin < thezipfile.zip
and
cat thezipfile.zip | zipinfo /dev/stdin
The first command will show the contents of the zipfile while the second will show an error, though it is a misleading error because zipinfo does not check the result of the seek call and errors later on.
A useless use of cat is always to be avoided. It's like driving with the handbrake on. It wastes CPU cycles for nothing, the OS constantly context switching between the cat process and the next in the pipe. If all the world's useless cats were gone and stopped being invented, reinvented, passed on from father to son, we wouldn't have global warming because we could easily live with 1.21 Gigawatts of power saved.
Thanks. I feel better now. Please join me in my crusade to stamp out useless use of cat on stackoverflow. This site is, as far as I perceive it, a major contribution to the proliferation of useless cats. I don't blame the newbies, but I do want to teach them. Workers and newbies of the world, loosen the handbrakes and save the planet!!!1!
cat will allow you to pipe multiple files in sequentially. Otherwise, < redirection and cat file | produce the same side effects.
Pipes cause a subshell to be invoked for the command on the right. This interferes with environment variables.
cat foo | while read line
do
...
done
echo "$line"
versus
while read line
do
...
done < foo
echo "$line"
One further difference is behavior on a blocking open() of the input file.
For example, assuming input is a FIFO with no writers, one invocation will not spawn any child programs until the input file is opened, while the other will spawn two processes:
prog ... < a_fifo # 'prog' not launched until shell can open file
cat a_fifo | prog ... # 'prog' and 'cat' are running (latter may block on open)
In practice this rarely matters except in contrived circumstances. prog might periodically log or do some cleanup work while waiting for input, for example, which you might want to happen even if no input is available. (Why wouldn't prog be sophisticated enough to open its own input fifo nonblocking?)
cat file | starts up another program (cat) that doesn't have to start in the second case. It also makes it more confusing if you want to use "here documents". But it should behave the same.

How do you resolve issues with named pipes?

I have a binary program* which takes the contents of a supplied file, processes it, and prints the result on the screen through stdout. For an automation script, I would like to use a named pipe to send data to this program and process the output myself. After trying to get the script to work I realized that there is an issue with the binary program accepting data from the named pipe. To illustrate the problem I have outlined several tests using the unix shell.
It is easy to show that the program works by processing an actual data file.
$ binprog file.txt > output.txt
This will result in output.txt containing the processed information from file.txt.
The named pipe (pipe.txt) works as seen by this demonstration.
$ cat pipe.txt > output.txt
$ cat file.txt > pipe.txt
This will result in output.txt containing the data from file.txt after it has been sent through the pipe.
When the binary program is reading from the named pipe instead of the file, things do not work correctly.
$ binprog pipe.txt > output.txt
$ cat file.txt > pipe.txt
In this case output.txt contains no data even after cat and binprog terminate. Using top and ps, I can see binprog "running" and seemingly doing work. Everything executes with no errors.
Why is there no output produced by binprog in this third example?
What are some things I could try to get this working?
[*] The program in question is svm-scale from libsvm. I chose to generalize the examples to keep them clean and simple.
Are you sure the program will work with a pipe? If it needs random access to the input file it won't work. The program will get an error whenever it tries to seek in the input file.
If you know the program is designed to work with pipes, and you're using bash, you can use process substitution to avoid having to explicitly create the named pipe.
binprog <(cat file.txt) > output.txt
Does binprog also accept input on stdin? If so, this might work for you.
cat pipe.txt | binprog > output.txt
cat file.txt > pipe.txt
Edit: Briefly scanned the manpage for svm-scale. Give this a whirl instead:
cat pipe.txt | svm-scale - > output.txt
If binprog is not working well with anything other than a terminal as an input, maybe you need to give it a (pseudo-)terminal (pty) for its input. That is harder to organize, but the expect program is one way of doing that relatively easily. There are discussions of programming with pty's in
Advanced Programming in the Unix Environment, 3rd Edn by W Richard Stevens and Stephen A Rago, and in Advanced Unix Programming, 2nd Edn by Marc J Rochkind.
Something else to look at is the output of truss or strace or the local equivalent. These programs log all the system calls made by a process. On Solaris, I'd run:
truss -o binprog.truss binprog
interactively, and see what it does. Then I'd try it with i/o redirection, and then with i/o redirection from the named pipe; there may be some significant differences between what it does, or you may see the system call that is hanging. If you see forks in the truss log file, you would need to add a '-f' flag to follow children.

Resources