command pipe into subshell - unix

What is the difference between
cat dat | tee >(wc -l ) | some other command
and
cat dat | tee file | wc -l
in terms of what is happening under the hood?
I can understand the second one as tee is forking the stream into a file and also to a pipe. But I am confused with the first one.

The first notation is the process substitution of Bash 4.x (not in 3.x, or not all versions of 3.x).
As far as tee is concerned, it is given a file name (such as /dev/fd/64) to which it writes as well as to standard output; it is actually a file descriptor for the write end of a pipe. As far as wc is concerned, it reads its standard input (which is the read end of the pipe that is connected to /dev/fd/64 for tee), and writes its answer to the standard output of the shell invoking the pipeline (not the standard output of tee which goes down the pipeline).

Since >( is process substitiution of bash,
the first line says:
send the contents of file 'dat' into some other command
while process 'wc' is run with its input or output
connected to a pipe which also sends the content of 'dat'
check "Process Substitution" of bash manpage.

Related

Pipe and fork in unix |?

I was reading what's a pipe in an operating system and there's something I don't understand.
Take a sequence of unix pipe-character-separated commands like
cat file | grep "something"
what happens when the pipe | is processed? I understand that a unix pipe is opened through the pipe() function, but I don't see how a 'fork' would take place here in any of the processes involved.
What happens and how is a fork involved (if any) ?
For your case there are actually two fork calls being made: One for the cat command and one for the grep command.
The fork calls are needed, because the only standard way in POSIX to execute programs is with the exec family of calls, and those replace the current process image with the image from the loaded program. And if the shell doesn't fork a new process for each command it executes, the first program run would replace the shell, and the shell will not exist anymore.
The pipe is set up so that the write-end of the pipe will be connected to standard output of the first child process (the cat command), and the read-end of the pipe will be connected to standard input of the second child process (the grep command).
All of this is happening behind the scenes in the shell.

How do I run a command(with its output being piped to a file) in the background?

Say I have a program foo, which prints a gazillion lines to the console.
How do I run it in the background, while piping its output to a file?
I tried this
./foo | output.txt&
Doesn't seem to work
Take a look at the nohup utility, it allows to detach a command from the tty:
nohup sh -c "./foo 2>&1 > output.txt" &
Piping the output of a command to a file actually does not work, you can only redirect it: that is the > output.txt. Piping makes sense if what follows is a command again which accepts input from its standard input, but not for a passive file. The additional 2>&1 redirects the commands standard error output into the standard output, so that you have only one single output pipe, otherwise potential errors would still spill out to the controlling tty. The actual command here is a shell invoked, that is because piping will break the sequence otherwise.

Force line-buffering of stdout in a pipeline

Usually, stdout is line-buffered. In other words, as long as your printf argument ends with a newline, you can expect the line to be printed instantly. This does not appear to hold when using a pipe to redirect to tee.
I have a C++ program, a, that outputs strings, always \n-terminated, to stdout.
When it is run by itself (./a), everything prints correctly and at the right time, as expected. However, if I pipe it to tee (./a | tee output.txt), it doesn't print anything until it quits, which defeats the purpose of using tee.
I know that I could fix it by adding a fflush(stdout) after each printing operation in the C++ program. But is there a cleaner, easier way? Is there a command I can run, for example, that would force stdout to be line-buffered, even when using a pipe?
you can try stdbuf
$ stdbuf --output=L ./a | tee output.txt
(big) part of the man page:
-i, --input=MODE adjust standard input stream buffering
-o, --output=MODE adjust standard output stream buffering
-e, --error=MODE adjust standard error stream buffering
If MODE is 'L' the corresponding stream will be line buffered.
This option is invalid with standard input.
If MODE is '0' the corresponding stream will be unbuffered.
Otherwise MODE is a number which may be followed by one of the following:
KB 1000, K 1024, MB 1000*1000, M 1024*1024, and so on for G, T, P, E, Z, Y.
In this case the corresponding stream will be fully buffered with the buffer
size set to MODE bytes.
keep this in mind, though:
NOTE: If COMMAND adjusts the buffering of its standard streams ('tee' does
for e.g.) then that will override corresponding settings changed by 'stdbuf'.
Also some filters (like 'dd' and 'cat' etc.) dont use streams for I/O,
and are thus unaffected by 'stdbuf' settings.
you are not running stdbuf on tee, you're running it on a, so this shouldn't affect you, unless you set the buffering of a's streams in a's source.
Also, stdbuf is not POSIX, but part of GNU-coreutils.
Try unbuffer (man page) which is part of the expect package. You may already have it on your system.
In your case you would use it like this:
unbuffer ./a | tee output.txt
The -p option is for pipeline mode where unbuffer reads from stdin and passes it to the command in the rest of the arguments.
You can use setlinebuf from stdio.h.
setlinebuf(stdout);
This should change the buffering to "line buffered".
If you need more flexibility you can use setvbuf.
You may also try to execute your command in a pseudo-terminal using the script command (which should enforce line-buffered output to the pipe)!
script -q /dev/null ./a | tee output.txt # Mac OS X, FreeBSD
script -c "./a" /dev/null | tee output.txt # Linux
Be aware the script command does not propagate back the exit status of the wrapped command.
The unbuffer command from the expect package at the #Paused until further notice answer did not worked for me the way it was presented.
Instead of using:
./a | unbuffer -p tee output.txt
I had to use:
unbuffer -p ./a | tee output.txt
(-p is for pipeline mode where unbuffer reads from stdin and passes it to the command in the rest of the arguments)
The expect package can be installed on:
MSYS2 with pacman -S expect
Mac OS with brew install expect
Update
I recently had buffering problems with python inside a shell script (when trying to append timestamp to its output). The fix was to pass -u flag to python this way:
run.sh with python -u script.py
unbuffer -p /bin/bash run.sh 2>&1 | tee /dev/tty | ts '[%Y-%m-%d %H:%M:%S]' >> somefile.txt
This command will put a timestamp on the output and send it to a file and stdout at the same time.
The ts program (timestamp) can be installed with the moreutils package.
Update 2
Recently, also had problems with grep buffering the output, when I used the argument grep --line-buffered on grep to it stop buffering the output.
If you use the C++ stream classes instead, every std::endl is an implicit flush. Using C-style printing, I think the method you suggested (fflush()) is the only way.
The best answer IMO is grep's --line-buffer option as stated here:
https://unix.stackexchange.com/a/53445/40003

Unix: Grep on console output

This is my first question on stackoverflow!
I want to have a unix script that will run grep on the console output. Here is what my script does:
1. Telnet into a remote server (I have done this part successfully)
2. On successful login, the remote server displays outputs information on the console. I need to run grep on that console output (need help with this)
So, I need a script to run grep on the output appearing on the console.
Any thoughts??
Thanks,
Puneet
Use SSH instead. It's more secure and far easier to script.
ssh remoteusername#remotehost:/path/to/remote/script | grep 'something'
with appropriate key setup, it won't even prompt you for a password.
Have you tried I/O redirection? You could either do
$ your-command > output.txt
and then run grep on that file, or just directly pipe the output through grep like so
$ your-command | grep ...
See this article or google around for similar. There are probably thousands of good articles about this around the web.
Instead of telnet, I would suggest using netcat (nc). You could then pass your login credentials via standard input and grep the standard output (nc prints anything sent by the server on standard output).
nc <host> <port> <auth.txt | grep 'string'
What you want to do is probably using a pipe. You can probably see it in the above answers it's the | sign you see in the command. It may be difficult to locate on your keyboard, depending on the layout. (I have to admit it is not very often used).
Pipes will redirect the output of one command. Instead of sending it to the console, they will send it as an input of another command.
cmd1 | grep foo is equivalent to running grep foo on the output of cmd1 (you can replace cmd1 by your netstat command).
One last thing is that you can have as many pipes as you want. For instance on my machine I can run ls -ltr | tail -1 | awk '{print $9}' | grep foo to look for the word foo in the last modified file.

piping in UNIX doubt

In The Unix Programming Environment by K & P, it is written that
" The programs in a pipeline actually run at the same time, not one after another.
This means that programs in a pipeline can be interactive;"
How can programs run at same time?
For ex: $ who | grep mary | wc -l
How grep mary will be executed until who is run or how wc -l will be executed until it
knows results of previous programs?
All three programs will start. grep and wc wait for input via stdin
who will output a line of data, which grep will then receive
If the line matches, grep will write it to stdout, which wc will then read and count
In the meantime, who may also have been writing out more data for grep etc
Each program needs the results of the previous one, but it doesn't need all of the results before it can start working, which is why pipelining is feasible.

Resources