Unix concatenation commands - unix

I've got a question here that I dont understand:
cat abc.dat | tee bcd.dat | tr ab ba > cde.dat
In this instance, I understand the translate part, but I’m a little confused as to what the pipe | does, I’ve been told that it takes the stdout of a program with the stdin of another. If I were to try to explain this, correct me if I’m wrong but you’re taking abc.dat’s contents, duplicating the output into bcd.dat, and then taking the content from bcd.dat and translating instances of a and b into b and a respectively, and then taking that and putting it into cde.dat?
The official answer is: abc.dat is copied to bcd.dat. and abc.dat is copied to cde.dat but with 'a’ replaced by 'b' and 'b‘ replaced by 'a'. But why is abc.dat copied into cde.dat instead of bcd.dat? does the pipe not continue?

The "official" answer is poorly worded. Neither tee nor tr know anything about abc.dat; it just happens that what it reads from tee is what tee read from cat, which is what cat read from abc.dat.

The tee utility reads from stdin and writes this firstly to stdout, and secondly to all files given as argument. In your case, stdin is written to stdout, and to file bcd.dat. The pipe behind the tee links the stdout of tee to the stdin of the tr. Although the data is the same, it is not the content of file bcd.dat that is being piped to tr. It is the output of the initial cat, which in turn is the content of file abc.dat.

Related

Why not pipe list of file names into cat?

What is the design rationale that cat doesn't take list of file names from pipe input? Why did the designers choose that the following does not work?
ls *.txt | cat
Instead of this, they chose that we need to pass the file names as argument to cat as:
ls *.txt | xargs cat
When you say ls *.txt | cat doesn't work, you should say that doesn't work as you expect. In fact, that works in the way it was thought to work.
From man:
cat - Concatenate FILE(s), or standard input, to standard output
Suppose the next output:
$ ls *.txt
file1.txt
file2.txt
... the input to cat will be:
file1.txt
file2.txt
...and that's exactly what cat output in the standard output
In some shells, it's equivalent to:
cat <(ls *.txt)
or
ls *.txt > tmpfile; cat tmpfile
So, cat is really working as their designers expected to do so.
On the other hand, what you are expecting is that cat interprets its input as a set of filenames to read and concatenate their content, but when you pipe to cat, that input works as a lonely file.
To make it short, cat is a command, like echo or cp, and few others, which cannot convert pipe redirected input stream into arguments.
So, xargs, is used, to pass the input stream as an argument to the command.
More details here: http://en.wikipedia.org/wiki/Xargs
As a former unix SA, and now, Python developer, I believe I could compare xargs, to StringIO/CStringIO, in Python, as it kind of helps the same way.
When it comes to your question: Why didn't they allow stream input? Here is what I think
Nobody but them could answer this.
I believe, however, than cat is meant to print to stdout the content of a file, while the command echo, was meant to print to stdout the content of a string.
Each of these commands, had a specific role, when created.

In what order does cat choose files to display?

I have the following line in a bash script:
find . -name "paramsFile.*" | xargs -n131072 cat > parameters.txt
I need to make sure the order the files are concatenated in does not change when I use this command. For example, if I run this command twice on the same set of paramsFile.*, parameters.txt should be the same both times. My question is, is this the case? And if it isn't, how can I make sure it is?
Thanks!
Edit: the same question goes for xargs: would that change how the files are fed to cat?
Edit2: as William Pursell pointed out, this question is actually about find. Does find always return files in the same order?
From description in man cat:
The cat utility reads files sequentially, writing them to the standard
output. The file operands are processed in command-line order.
If file is a single dash (`-') or absent, cat reads from the standard input. If file is a UNIX domain socket, cat connects to it
and
then reads it until EOF. This complements the UNIX domain binding capability available in inetd(8).
So yes as long as you pass the files to cat in the same order every time you'll be ok.

Implementing the "more" Unix utility command

i am trying to implement the more command. I want to learn that how can I understand if there is a pipe. For example, if I type from the shell
cat file1 file2 | more
how can I handle that inside the implementation of more?
And is the implementation of more available as open source?
Actually i could not succeed from reading stdin.I've managed doing more file.txt but not cat file | more..
i think i should first read from user and put a buffer than print the buffer. my code contains:
if(argc == 1)
{
fgets(line, 255, 0);
printf("%s", line);
}
but it gives error.
The more syntax is
more [options] [file_name]
If you don't provide a file name, the more command gets input from stdin; you can provide this input (via stdin) using a pipe, for example:
cat file.txt | more
This sends the output of the cat command to more. This is the same as doing:
more file.txt
You don't need to specifically know if there is a pipe or not; you just need to check if a filename was passed as argument to more. If so, the input is considered to be the contents of the file. If not, the input is considered to originate from stdin.
As for the source code, some google searching will take you a long way. Here is some old source code from FreeBSD:
http://svnweb.freebsd.org/base/stable/2.0.5/usr.bin/more/
Or more recente source from the Ubuntu repositories:
http://bazaar.launchpad.net/~vcs-imports/util-linux-ng/trunk/files/head:/text-utils/
suggest u first check the argc whether equal 1, if equal 1 ,you shoule use the stdin as your inpout file handle, so your program can handle the situation as cat file.txt | more

cat file | ... vs ... <file

Is there a case of ... or context where cat file | ... behaves differently than ... <file?
When reading from a regular file, cat is in charge of reading the data, performs it as it pleases, and might constrain it in the way it writes it to the pipeline. Obviously, the contents themselves are preserved, but anything else could be tainted. For example: block size and data arrival timing. Additionally, the pipe in itself isn't always neutral: it serves as an additional buffer between the input and ....
Quick and easy way to make the block size issue apparent:
$ cat large-file | pv >/dev/null
5,44GB 0:00:14 [ 393MB/s] [ <=> ]
$ pv <large-file >/dev/null
5,44GB 0:00:03 [1,72GB/s] [=================================>] 100%
Besides the thing posted by other users, when using input redirection from a file, standard input is the file but when piping the output of cat to the input, standard input is a stream with the contents of the file. When standard input is the file will be able to seek within the file but the pipe will not allow it. You can see this by finding a zip file and running the following commands:
zipinfo /dev/stdin < thezipfile.zip
and
cat thezipfile.zip | zipinfo /dev/stdin
The first command will show the contents of the zipfile while the second will show an error, though it is a misleading error because zipinfo does not check the result of the seek call and errors later on.
A useless use of cat is always to be avoided. It's like driving with the handbrake on. It wastes CPU cycles for nothing, the OS constantly context switching between the cat process and the next in the pipe. If all the world's useless cats were gone and stopped being invented, reinvented, passed on from father to son, we wouldn't have global warming because we could easily live with 1.21 Gigawatts of power saved.
Thanks. I feel better now. Please join me in my crusade to stamp out useless use of cat on stackoverflow. This site is, as far as I perceive it, a major contribution to the proliferation of useless cats. I don't blame the newbies, but I do want to teach them. Workers and newbies of the world, loosen the handbrakes and save the planet!!!1!
cat will allow you to pipe multiple files in sequentially. Otherwise, < redirection and cat file | produce the same side effects.
Pipes cause a subshell to be invoked for the command on the right. This interferes with environment variables.
cat foo | while read line
do
...
done
echo "$line"
versus
while read line
do
...
done < foo
echo "$line"
One further difference is behavior on a blocking open() of the input file.
For example, assuming input is a FIFO with no writers, one invocation will not spawn any child programs until the input file is opened, while the other will spawn two processes:
prog ... < a_fifo # 'prog' not launched until shell can open file
cat a_fifo | prog ... # 'prog' and 'cat' are running (latter may block on open)
In practice this rarely matters except in contrived circumstances. prog might periodically log or do some cleanup work while waiting for input, for example, which you might want to happen even if no input is available. (Why wouldn't prog be sophisticated enough to open its own input fifo nonblocking?)
cat file | starts up another program (cat) that doesn't have to start in the second case. It also makes it more confusing if you want to use "here documents". But it should behave the same.

How do you resolve issues with named pipes?

I have a binary program* which takes the contents of a supplied file, processes it, and prints the result on the screen through stdout. For an automation script, I would like to use a named pipe to send data to this program and process the output myself. After trying to get the script to work I realized that there is an issue with the binary program accepting data from the named pipe. To illustrate the problem I have outlined several tests using the unix shell.
It is easy to show that the program works by processing an actual data file.
$ binprog file.txt > output.txt
This will result in output.txt containing the processed information from file.txt.
The named pipe (pipe.txt) works as seen by this demonstration.
$ cat pipe.txt > output.txt
$ cat file.txt > pipe.txt
This will result in output.txt containing the data from file.txt after it has been sent through the pipe.
When the binary program is reading from the named pipe instead of the file, things do not work correctly.
$ binprog pipe.txt > output.txt
$ cat file.txt > pipe.txt
In this case output.txt contains no data even after cat and binprog terminate. Using top and ps, I can see binprog "running" and seemingly doing work. Everything executes with no errors.
Why is there no output produced by binprog in this third example?
What are some things I could try to get this working?
[*] The program in question is svm-scale from libsvm. I chose to generalize the examples to keep them clean and simple.
Are you sure the program will work with a pipe? If it needs random access to the input file it won't work. The program will get an error whenever it tries to seek in the input file.
If you know the program is designed to work with pipes, and you're using bash, you can use process substitution to avoid having to explicitly create the named pipe.
binprog <(cat file.txt) > output.txt
Does binprog also accept input on stdin? If so, this might work for you.
cat pipe.txt | binprog > output.txt
cat file.txt > pipe.txt
Edit: Briefly scanned the manpage for svm-scale. Give this a whirl instead:
cat pipe.txt | svm-scale - > output.txt
If binprog is not working well with anything other than a terminal as an input, maybe you need to give it a (pseudo-)terminal (pty) for its input. That is harder to organize, but the expect program is one way of doing that relatively easily. There are discussions of programming with pty's in
Advanced Programming in the Unix Environment, 3rd Edn by W Richard Stevens and Stephen A Rago, and in Advanced Unix Programming, 2nd Edn by Marc J Rochkind.
Something else to look at is the output of truss or strace or the local equivalent. These programs log all the system calls made by a process. On Solaris, I'd run:
truss -o binprog.truss binprog
interactively, and see what it does. Then I'd try it with i/o redirection, and then with i/o redirection from the named pipe; there may be some significant differences between what it does, or you may see the system call that is hanging. If you see forks in the truss log file, you would need to add a '-f' flag to follow children.

Resources