difference between grep Vs cat and grep - unix

i would like to know difference between below 2 commands, I understand that 2) should be use but i want to know the exact sequence that happens in 1) and 2)
suppose filename has 200 characters in it
1) cat filename | grep regex
2) grep regex filename

Functionally (in terms of output), those two are the same. The first one actually creates a separate process cat which simply send the contents of the file to standard output, which shows up on the standard input of the grep, because the shell has connected the two with a pipe.
In that sense grep regex <filename is also equivalent but with one less process.
Where you'll start seeing the difference is in variants when the extra information (the file names) is used by grep, such as with:
grep -n regex filename1 filename2
The difference between that and:
cat filename1 filename2 | grep -n regex
is that the former knows about the individual files whereas the latter sees it as one file (with no name).
While the former may give you:
filename1:7:line with regex in 10-line file
filename2:2:another regex line
the latter will be more like:
7:line with regex in 10-line file
12:another regex line
Another executable that acts differently if it knows the file names is wc, the word counter programs:
$ cat qq.in
1
2
3
$ wc -l qq.in # knows file so prints it
3 qq.in
$ cat qq.in | wc -l # does not know file
3
$ wc -l <qq.in # also does not know file
3

First one:
cat filename | grep regex
Normally cat opens file and prints its contents line by line to stdout. But here it outputs its content to pipe'|'. After that grep reads from pipe(it takes pipe as stdin) then if matches regex prints line to stdout. But here there is a detail grep is opened in new shell process so pipe forwards its input as output to new shell process.
Second one:
grep regex filename
Here grep directly reads from file(above it was reading from pipe) and matches regex if matched prints line to stdout.

If you want to check the actual execution time diffrence, first create a file with 100000 lines:
user#server ~ $ for i in $(seq 1 100000); do echo line${1} >> test_f; done
user#server ~ $ wc -l test_f
100000 test_f
Now measure:
user#server ~ $ time grep line test_f
#...
real 0m1.320s
user 0m0.101s
sys 0m0.122s
user#server ~ $ time cat test_f | grep line
#...
real 0m1.288s
user 0m0.132s
sys 0m0.108s
As we can see, the diffrence is not too big...

Actually, though the outputs are the same;
-$cat filename | grep regex
This command looks for the content of the file "filename", then fetches regex in it; while
-$grep regex filename
This command directly searches for the content named regex in the file "filename"

Functionally they are equivalent, however, the shell will fork two processes for cat filename | grep regex and connect them with a pipe.

Related

Why not pipe list of file names into cat?

What is the design rationale that cat doesn't take list of file names from pipe input? Why did the designers choose that the following does not work?
ls *.txt | cat
Instead of this, they chose that we need to pass the file names as argument to cat as:
ls *.txt | xargs cat
When you say ls *.txt | cat doesn't work, you should say that doesn't work as you expect. In fact, that works in the way it was thought to work.
From man:
cat - Concatenate FILE(s), or standard input, to standard output
Suppose the next output:
$ ls *.txt
file1.txt
file2.txt
... the input to cat will be:
file1.txt
file2.txt
...and that's exactly what cat output in the standard output
In some shells, it's equivalent to:
cat <(ls *.txt)
or
ls *.txt > tmpfile; cat tmpfile
So, cat is really working as their designers expected to do so.
On the other hand, what you are expecting is that cat interprets its input as a set of filenames to read and concatenate their content, but when you pipe to cat, that input works as a lonely file.
To make it short, cat is a command, like echo or cp, and few others, which cannot convert pipe redirected input stream into arguments.
So, xargs, is used, to pass the input stream as an argument to the command.
More details here: http://en.wikipedia.org/wiki/Xargs
As a former unix SA, and now, Python developer, I believe I could compare xargs, to StringIO/CStringIO, in Python, as it kind of helps the same way.
When it comes to your question: Why didn't they allow stream input? Here is what I think
Nobody but them could answer this.
I believe, however, than cat is meant to print to stdout the content of a file, while the command echo, was meant to print to stdout the content of a string.
Each of these commands, had a specific role, when created.

Using inverse grep to compare two .txt files

I have two .txt files "test1.txt" and "test2.txt" and I want to use inverse grep (UNIX) to find out all lines in test2.txt that do not contain any of the lines in test1.txt
test1.txt contains only user names, while test2.txt contains longer strings of text. I only want the lines in test2.txt that DO NOT contain the usernames found in test1.txt
Would it be something like?
grep -v test1.txt test2.txt > answer.txt
Your were almost there just missed one option in your command (i.e -f )
Your Solution should be use the -f flag, see below for sample session demonstrating the same
Demo Session
$ # first file
$ cat a.txt
xxxx yyyy
kkkkkk
zzzzzzzz
$ # second file
$ cat b.txt
line doesnot contain any name
This person is xxxx yyyy good
Another line which doesnot contain any name
Is kkkkkk a good name ?
This name itself is sleeping ...zzzzzzzz
I can't find any other name
Lets try the command now
$ # -i is used to ignore the case while searching
$ # output contains only lines from second file not containing text for first file lines
$ grep -v -i -f a.txt b.txt
line doesnot contain any name
Another line which doesnot contain any name
I can't find any other name
Lets try the command now
They're probably better ways to do this ie. without grep but heres a solution which will work
grep -v -P "($(sed ':a;N;$!ba;s/\n/)|(/g' test1.txt))" test2.txt > answer.txt
To explain this:
$(sed ':a;N;$!ba;s/\n/)|(/g' test1.txt) is an embedded sed command which outputs a string where each newline in test1.txt is replaced by )|( the output is then inserted into a perl style regex (-P) for grep to use, so that grep is searching test2.txt for the every line in text1.txt and returns only those in test2.txt which don't contain lines in test1.txt because of the -v param.
What flavor of unix are you using? This will provide us with a better understanding of what is available to you from the command line. Currently what you have will not work, you're looking for the diff command which compares two files.
You can do the following for OS X 10.6 I have tested this at home.
diff -i -y FILE1 FILE2
diff compares the files -i will ignore the case if this does not matter so Hi and HI will still mean the same. Finally -y will output side by side the results If you want to out the information to a file you could do diff -i -y FILE1 FILE2 >> /tmp/Results.txt

grepping lines from a document using xargs

Let's say I have queries.txt.
queries.txt:
cat
dog
123
now I want to use them are queries to find lines in myDocument.txt using grep.
cat queries.txt | xargs grep -f myDocument.txt
myDocument has lines like
cat
i have a dog
123
mouse
it should return the first 3 lines. but it's not. instead, grep tries to find them as file names. what am i doing wrong?
Here, you just need:
grep -f queries.txt myDocument.txt
This causes grep to read the regular expressions from the file queries.txt and then apply them to myDocument.txt.
In the xargs version, you were effectively writing:
grep -f myDocument.txt cat dog 123
If you absolutely must use xargs, then you'll need to write:
xargs -I % grep -e % myDocument.txt < queries.txt
This avoids a UUOC — Useless Use of cat – award by redirecting standard input from queries.txt. It uses the -I % option to specify where the replacement text should go in the command line. Using the -e option means that if the pattern is, say --help, you won't run into problems with (GNU) grep treating that as an argument (and therefore printing its help message).
The grep -e option will take a pattern string as an argument. -f treats the argument as a file name of a file with patterns in it.

How to save both matching and non-matching from grep

I use grep very often and am familiar with it's ability to return matching lines (by default) and non-matching lines (using the -v parameter). However, I want to be able to grep a file once to separate matching and non-matching lines.
If this is not possible, please let me know. I realize I could do this easily in perl or awk, but am curious if it is possible with grep.
Thanks!
If it does NOT have to be grep - this is a single pass split based on a pattern -- pattern found > file1 pattern not found > file2
awk '/pattern/ {print $0 > "file1"; next}{print $0 > "file2"}' inputfile
I had the exact same problem and I wrote a small Perl script for that [1]. It only accepts one argument: the regex to grep input on.
[1] https://gist.github.com/tonejito/c9c0bffd75d8c81483f9107c609439e1
It reads STDIN by line and checks against the given regex, matched lines go to STDOUT and not matched go to STDERR.
I made it this way because this tool sits in the middle of a pipeline and I use shell redirection to save the files on their final location.
Step 1 : Read the file
Step 2 : Replace spaces with a new line and save the result in a temporary file
Step 3 : Get only lines contains '_' from the temporary file and save it into multiwords.txt
Step 4 : Exclude the lines that contains '-' from the temporary file then save the result into singlewords.txt
Step 5 : Delete the temporary file
cat file | tr ' ' '\n' > tmp.txt | grep '_' tmp.txt > multiwords.txt | grep -v '_' tmp.txt > singlewords.txt | find . -type f -name 'tmp.txt' -delete

Unix Pipes for Command Argument [duplicate]

This question already has answers here:
How to pass command output as multiple arguments to another command
(5 answers)
Read expression for grep from standard input
(1 answer)
Closed last month.
I am looking for insight as to how pipes can be used to pass standard output as the arguments for other commands.
For example, consider this case:
ls | grep Hello
The structure of grep follows the pattern: grep SearchTerm PathOfFileToBeSearched. In the case I have illustrated, the word Hello is taken as the SearchTerm and the result of ls is used as the file to be searched. But what if I want to switch it around? What if I want the standard output of ls to be the SearchTerm, with the argument following grep being PathOfFileToBeSearched? In a general sense, I want to have control over which argument the pipe fills with the standard output of the previous command. Is this possible, or does it depend on how the script for the command (e.g., grep) was written?
Thank you so much for your help!
grep itself will be built such that if you've not specified a file name, it will open stdin (and thus get the output of ls). There's no real generic mechanism here - merely convention.
If you want the output of ls to be the search term, you can do this via the shell. Make use of a subshell and substitution thus:
$ grep $(ls) filename.txt
In this scenario ls is run in a subshell, and its stdout is captured and inserted in the command line as an argument for grep. Note that if the ls output contains spaces, this will cause confusion for grep.
There are basically two options for this: shell command substitution and xargs. Brian Agnew has just written about the former. xargs is a utility which takes its stdin and turns it into arguments of a command to execute. So you could run
ls | xargs -n1 -J % grep -- % PathOfFileToBeSearched
and it would, for each file output by ls, run grep -e filename PathOfFileToBeSearched to grep for the filename output by ls within the other file you specify. This is an unusual xargs invocation; usually it's used to add one or more arguments at the end of a command, while here it should add exactly one argument in a specific place, so I've used -n and -J arguments to arrange that. The more common usage would be something like
ls | xargs grep -- term
to search all of the files output by ls for term. Although of course if you just want files in the current directory, you can this more simply without a pipeline:
grep -- term *
and likewise in your reversed arrangement,
for filename in *; do
grep -- "$#" PathOfFileToBeSearched
done
There's one important xargs caveat: whitespace characters in the filenames generated by ls won't be handled too well. To do that, provided you have GNU utilities, you can use find instead.
find . -mindepth 1 -maxdepth 1 -print0 | xargs -0 -n1 -J % grep -- % PathOfFileToBeSearched
to use NUL characters to separate filenames instead of whitespace

Resources