Grepping for exact words with UNIX - unix

I want to search Exact word pattern in Unix.
Example: Log.txt file contains following text:
aaa
bbb
cccaaa ---> this should not be counted in grep output looking for aaa
I am using following code:
count=$?
count=$(grep -c aaa $EAT_Setup_BJ3/Log.txt)
Here output should be ==> 1 not 2, using above code I am getting 2 as output.
Something is missing, so can any one help me for the this please?

Use whole word option:
grep -c -w aaa $EAT_Setup_BJ3/Log.txt
From the grep manual:
-w, --word-regexp
Select only those lines containing matches that form whole words. The test is that the matching substring must
either be at the beginning of the line, or preceded by a non-word constituent character.
As noted in the comment -w is a GNU extension. With a non GNU grep you can use the word boundaries:
grep -c "\<aaa\>" $EAT_Setup_BJ3/Log.txt

Word boundary matching is an extension to the standard POSIX grep utility. It might be available or not. If you want to search for words portably, I suggest you look into perl instead, where you would use
perl -ne 'print if /\baaa\b/' $EAT_Setup_BJ3/Log.txt

You can use a word boundary (\b) in regex to match an exact word. To enable extended regex, use the -E flag with grep.
Solution:
grep -E "\baaa\b" $EAT_Setup_BJ3/Log.txt

Related

Removing comments from a datafile. What are the differences?

Let's say that you would like to remove comments from a datafile using one of two methods:
cat file.dat | sed -e "s/\#.*//"
cat file.dat | grep -v "#"
How do these individual methods work, and what is the difference between them? Would it also be possible for a person to write the clean data to a new file, while avoiding any possible warnings or error messages to end up in that datafile? If so, how would you go about doing this?
How do these individual methods work, and what is the difference
between them?
Yes, they work same though sed and grep are 2 different commands. Your sed command simply substitutes all those lines which having # with NULL. On other hand grep will simply skip or ignore those lines which will skip lines which have # in it.
You could get more information on these by man page as follows:
man grep:
-v, --invert-match
Invert the sense of matching, to select non-matching lines. (-v is specified by POSIX.)
man sed:
s/regexp/replacement/
Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement. The
replacement may
contain the special character & to refer to that portion of the pattern space which matched, and the special escapes \1
through \9 to
refer to the corresponding matching sub-expressions in the regexp.
Would it also be possible for a person to write the clean data to a
new file, while avoiding any possible warnings or error messages to
end up in that datafile?
yes, we could re-direct the errors by using 2>/dev/null in both the commands.
If so, how would you go about doing this?
You could try like 2>/dev/null 1>output_file
Explanation of sed command: Adding explanation of sed command too now. This is only for understanding purposes and no need to use cat and then use sed you could use sed -e "s/\#.*//" Input_file instead.
sed -e " ##Initiating sed command here with adding the script to the commands to be executed
s/ ##using s for substitution of regexp following it.
\#.* ##telling sed to match a line if it has # till everything here.
//" ##If match found for above regexp then substitute it with NULL.
That grep -v will lose all the lines that have # on them, for example:
$ cat file
first
# second
thi # rd
so
$ grep -v "#" file
first
will drop off all lines with # on it which is not favorable. Rather you should:
$ grep -o "^[^#]*" file
first
thi
like that sed command does but this way you won't get empty lines. man grep:
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.

Unix - How to search for exact string in a file

I am trying to search for all files that contain exactly same id as listed in another file and put the file names in another file. I am using below command to find the files.
grep -w -f SearchList.txt INFILES* > matched.txt
The ids are listed in SearchList.txt file
example -
450462134
747837483
352362362
The INFILES files contain data in this format-
0120171116 07:37:45:828501450462134 000001205 0120171116
07:37:45:828501747837483 000001205 0120171116
07:37:45:828501352362362 000001205
The ids which i am looking for are conjoined with other text at the beginning but it has a space at the end.
I tried putting \b at the beginning and end of the search text in SearchList.txt file but i still get incorrect results.
Any leads to right command will be greatly appreciated.
-bash-3.2$ bash --version
GNU bash, version 3.2.25(1)-release (x86_64-redhat-linux-gnu)
-bash-3.2$ grep --version
grep (GNU grep) 2.5.1
The -w option to grep actually inserts \b on both ends of the pattern, you only want it at the end. One option that works is to add \b to the patterns with sed, e.g.:
sed 's/$/\\b/' SearchList.txt
As you are only interested in matching filenames you should use the -l option with grep. Now use this together with grep and process substitution:
grep -lf <(sed 's/$/\\b/' /path/to/SearchList.txt) INFILES*

grep matches between two files and convert to lower case

I need a fast and efficient approach to the following problem (I am working with many files.) But for example:
I have two files: file2
Hello
Goodbye
Salut
Bonjour
and file1
Hello, is it Me you're looking for?
I would like to find any word in file 2 that exists in file 2, and then convert that word to lower case.
I can grep the words in a file by doing:
grep -f file2.txt file1.txt
and returns
Hello
So now I want to convert to
hello
so that the final output is
hello, is it Me you're looking for?
Where if I match multiple files:
grep -f file2.txt *_infile.txt
The output will be stored in respective separate outfiles.
I know I can convert to lower case using something like tr, but I only know how to do this on every instance of an uppercase letter. I only want to convert words common between two files from uppercase to lowercase.
Thanks.
I would solve the problem a bit differently.
First, I would mark matches in grep. --color=always works well, although it's somewhat cumbersome and potentially unreliable in detection. Then I would change marked matches with sed or perl:
grep --color=always -F -f file2.txt file1.txt | \
perl -p -e 's/\x1b.*?\[K(.*?)\x1b.*?\[K/\L\1/g'
The cryptic RE matches the coloring escape sequence before the match, de-coloring escape sequence right after the match and captures everything in between into group 1. Then it applies lowercase \L conversion to the capture. Likely GNU sed can do the same, but probably perl is more portable.

grep for special characters in Unix

I have a log file (application.log) which might contain the following string of normal & special characters on multiple lines:
*^%Q&$*&^#$&*!^#$*&^&^*&^&
I want to search for the line number(s) which contains this special character string.
grep '*^%Q&$*&^#$&*!^#$*&^&^*&^&' application.log
The above command doesn't return any results.
What would be the correct syntax to get the line numbers?
Tell grep to treat your input as fixed string using -F option.
grep -F '*^%Q&$*&^#$&*!^#$*&^&^*&^&' application.log
Option -n is required to get the line number,
grep -Fn '*^%Q&$*&^#$&*!^#$*&^&^*&^&' application.log
The one that worked for me is:
grep -e '->'
The -e means that the next argument is the pattern, and won't be interpreted as an argument.
From: http://www.linuxquestions.org/questions/programming-9/how-to-grep-for-string-769460/
A related note
To grep for carriage return, namely the \r character, or 0x0d, we can do this:
grep -F $'\r' application.log
Alternatively, use printf, or echo, for POSIX compatibility
grep -F "$(printf '\r')" application.log
And we can use hexdump, or less to see the result:
$ printf "a\rb" | grep -F $'\r' | hexdump -c
0000000 a \r b \n
Regarding the use of $'\r' and other supported characters, see Bash Manual > ANSI-C Quoting:
Words of the form $'string' are treated specially. The word expands to string, with backslash-escaped characters replaced as specified by the ANSI C standard
grep -n "\*\^\%\Q\&\$\&\^\#\$\&\!\^\#\$\&\^\&\^\&\^\&" test.log
1:*^%Q&$&^#$&!^#$&^&^&^&
8:*^%Q&$&^#$&!^#$&^&^&^&
14:*^%Q&$&^#$&!^#$&^&^&^&
You could try removing any alphanumeric characters and space. And then use -n will give you the line number. Try following:
grep -vn "^[a-zA-Z0-9 ]*$" application.log
Try vi with the -b option, this will show special end of line characters
(I typically use it to see windows line endings in a txt file on a unix OS)
But if you want a scripted solution obviously vi wont work so you can try the -f or -e options with grep and pipe the result into sed or awk.
From grep man page:
Matcher Selection
-E, --extended-regexp
Interpret PATTERN as an extended regular expression (ERE, see below). (-E is specified by POSIX.)
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched. (-F is specified
by POSIX.)

grep a tab in UNIX

How do I grep tab (\t) in files on the Unix platform?
If using GNU grep, you can use the Perl-style regexp:
grep -P '\t' *
The trick is to use $ sign before single quotes. It also works for cut and other tools.
grep $'\t' sample.txt
I never managed to make the '\t' metacharacter work with grep.
However I found two alternate solutions:
Using <Ctrl-V> <TAB> (hitting Ctrl-V then typing tab)
Using awk: foo | awk '/\t/'
From this answer on Ask Ubuntu:
Tell grep to use the regular expressions as defined by Perl (Perl has
\t as tab):
grep -P "\t" <file name>
Use the literal tab character:
grep "^V<tab>" <filename>
Use printf to print a tab character for you:
grep "$(printf '\t')" <filename>
One way is (this is with Bash)
grep -P '\t'
-P turns on Perl regular expressions so \t will work.
As user unwind says, it may be specific to GNU grep. The alternative is to literally insert a tab in there if the shell, editor or terminal will allow it.
Another way of inserting the tab literally inside the expression is using the lesser-known $'\t' quotation in Bash:
grep $'foo\tbar' # matches eg. 'foo<tab>bar'
(Note that if you're matching for fixed strings you can use this with -F mode.)
Sometimes using variables can make the notation a bit more readable and manageable:
tab=$'\t' # `tab=$(printf '\t')` in POSIX
id='[[:digit:]]\+'
name='[[:alpha:]_][[:alnum:]_-]*'
grep "$name$tab$id" # matches eg. `bob2<tab>323`
There are basically two ways to address it:
(Recommended) Use regular expression syntax supported by grep(1). Modern grep(1) supports two forms of POSIX 1003.2 regex syntax: basic (obsolete) REs, and modern REs. Syntax is described in details on re_format(7) and regex(7) man pages which are part of BSD and Linux systems respectively. The GNU grep(1) also supports Perl-compatible REs as provided by the pcre(3) library.
In regex language the tab symbol is usually encoded by \t atom. The atom is supported by BSD extended regular expressions (egrep, grep -E on BSD compatible system), as well as Perl-compatible REs (pcregrep, GNU grep -P).
Both basic regular expressions and Linux extended REs apparently have no support for the \t. Please consult UNIX utility man page to know which regex language it supports (hence the difference between sed(1), awk(1), and pcregrep(1) regular expressions).
Therefore, on Linux:
$ grep -P '\t' FILE ...
On BSD alike system:
$ egrep '\t' FILE ...
$ grep -E '\t' FILE ...
Pass the tab character into pattern. This is straightforward when you edit a script file:
# no tabs for Python please!
grep -q ' ' *.py && exit 1
However, when working in an interactive shell you may need to rely on shell and terminal capabilities to type the proper symbol into the line. On most terminals this can be done through Ctrl+V key combination which instructs terminal to treat the next input character literally (the V is for "verbatim"):
$ grep '<Ctrl>+<V><TAB>' FILE ...
Some shells may offer advanced support for command typesetting. Such, in bash(1) words of the form $'string' are treated specially:
bash$ grep $'\t' FILE ...
Please note though, while being nice in a command line this may produce compatibility issues when the script will be moved to another platform. Also, be careful with quotes when using the specials, please consult bash(1) for details.
For Bourne shell (and not only) the same behaviour may be emulated using command substitution augmented by printf(1) to construct proper regex:
$ grep "`printf '\t'`" FILE ...
Use echo to insert the tab for you grep "$(echo -e \\t)"
grep "$(printf '\t')" worked for me on Mac OS X
A good choice is to use sed.
sed -n '/\t/p' file
Examples (works in bash, sh, ksh, csh,..):
[~]$ cat testfile
12 3
1 4 abc
xa c
a c\2
1 23
[~]$ sed -n '/\t/p' testfile
xa c
a c\2
[~]$ sed -n '/\ta\t/p' testfile
a c\2
(This answer has been edited following suggestions in comments. Thank you all)
use gawk, set the field delimiter to tab (\t) and check for number of fields. If more than 1, then there is/are tabs
awk -F"\t" 'NF>1' file
+1 way, that works in ksh, dash, etc: use printf to insert TAB:
grep "$(printf 'BEGIN\tEND')" testfile.txt
On ksh I used
grep "[^I]" testfile
The answer is simpler. Write your grep and within the quote type the tab key, it works well at least in ksh
grep " " *
Using the 'sed-as-grep' method, but replacing the tabs with a visible character of personal preference is my favourite method, as it clearly shows both which files contain the requested info, and also where it is placed within lines:
sed -n 's/\t/\*\*\*\*/g' file_name
If you wish to make use of line/file info, or other grep options, but also want to see the visible replacement for the tab character, you can achieve this by
grep -[options] -P '\t' file_name | sed 's/\t/\*\*\*\*/g'
As an example:
$ echo "A\tB\nfoo\tbar" > test
$ grep -inH -P '\t' test | sed 's/\t/\*\*\*\*/g'
test:1:A****B
test:2:foo****bar
EDIT: Obviously the above is only useful for viewing file contents to locate tabs --- if the objective is to handle tabs as part of a larger scripting session, this doesn't serve any useful purpose.
This works well for AIX. I am searching for lines containing JOINED<\t>ACTIVE
voradmin cluster status | grep JOINED$'\t'ACTIVE
vorudb201 1 MEMBER(g) JOINED ACTIVE
*vorucaf01 2 SECONDARY JOINED ACTIVE
You might want to use grep "$(echo -e '\t')"
Only requirement is echo to be capable of interpretation of backslash escapes.
These alternative binary identification methods are totally functional. And, I really like the one's using awk, as I couldn't quite remember the syntaxic use with single binary chars. However, it should also be possible to assign a shell variable a value in a POSIX portable fashion (i.e. TAB=echo "#" | tr "\100" "\011"), and then employ it from there everywhere, in a POSIX portable fashion; as well (i.e grep "$TAB" filename). While this solution works well with TAB, it will also work well other binary chars, when another desired binary value is used in the assignment (instead of the value for the TAB character to 'tr').
The $'\t' notation given in other answers is shell-specific -- it seems to work in bash and zsh but is not universal.
NOTE: The following is for the fish shell and does not work in bash:
In the fish shell, one can use an unquoted \t, for example:
grep \t foo.txt
Or one can use the hex or unicode notations e.g.:
grep \X09 foo.txt
grep \U0009 foo.txt
(these notations are useful for more esoteric characters)
Since these values must be unquoted, one can combine quoted and unquoted values by concatenation:
grep "foo"\t"bar"
You can also use a Perl one-liner instead of grep resp. grep -P:
perl -ne 'print if /\t/' FILENAME
You can type
grep \t foo
or
grep '\t' foo
to search for the tab character in the file foo. You can probably also do other escape codes, though I've only tested \n. Although it's rather time-consuming, and unclear why you would want to, in zsh you can also type the tab character, back to the begin, grep and enclose the tab with quotes.
Look for blank spaces many times [[:space:]]*
grep [[:space:]]*'.''.'
Will find something like this:
'the tab' ..
These are single quotations ('), and not double ("). This is how you make concatenation in grep. =-)

Resources