grep a tab in UNIX - unix

How do I grep tab (\t) in files on the Unix platform?

If using GNU grep, you can use the Perl-style regexp:
grep -P '\t' *

The trick is to use $ sign before single quotes. It also works for cut and other tools.
grep $'\t' sample.txt

I never managed to make the '\t' metacharacter work with grep.
However I found two alternate solutions:
Using <Ctrl-V> <TAB> (hitting Ctrl-V then typing tab)
Using awk: foo | awk '/\t/'

From this answer on Ask Ubuntu:
Tell grep to use the regular expressions as defined by Perl (Perl has
\t as tab):
grep -P "\t" <file name>
Use the literal tab character:
grep "^V<tab>" <filename>
Use printf to print a tab character for you:
grep "$(printf '\t')" <filename>

One way is (this is with Bash)
grep -P '\t'
-P turns on Perl regular expressions so \t will work.
As user unwind says, it may be specific to GNU grep. The alternative is to literally insert a tab in there if the shell, editor or terminal will allow it.

Another way of inserting the tab literally inside the expression is using the lesser-known $'\t' quotation in Bash:
grep $'foo\tbar' # matches eg. 'foo<tab>bar'
(Note that if you're matching for fixed strings you can use this with -F mode.)
Sometimes using variables can make the notation a bit more readable and manageable:
tab=$'\t' # `tab=$(printf '\t')` in POSIX
id='[[:digit:]]\+'
name='[[:alpha:]_][[:alnum:]_-]*'
grep "$name$tab$id" # matches eg. `bob2<tab>323`

There are basically two ways to address it:
(Recommended) Use regular expression syntax supported by grep(1). Modern grep(1) supports two forms of POSIX 1003.2 regex syntax: basic (obsolete) REs, and modern REs. Syntax is described in details on re_format(7) and regex(7) man pages which are part of BSD and Linux systems respectively. The GNU grep(1) also supports Perl-compatible REs as provided by the pcre(3) library.
In regex language the tab symbol is usually encoded by \t atom. The atom is supported by BSD extended regular expressions (egrep, grep -E on BSD compatible system), as well as Perl-compatible REs (pcregrep, GNU grep -P).
Both basic regular expressions and Linux extended REs apparently have no support for the \t. Please consult UNIX utility man page to know which regex language it supports (hence the difference between sed(1), awk(1), and pcregrep(1) regular expressions).
Therefore, on Linux:
$ grep -P '\t' FILE ...
On BSD alike system:
$ egrep '\t' FILE ...
$ grep -E '\t' FILE ...
Pass the tab character into pattern. This is straightforward when you edit a script file:
# no tabs for Python please!
grep -q ' ' *.py && exit 1
However, when working in an interactive shell you may need to rely on shell and terminal capabilities to type the proper symbol into the line. On most terminals this can be done through Ctrl+V key combination which instructs terminal to treat the next input character literally (the V is for "verbatim"):
$ grep '<Ctrl>+<V><TAB>' FILE ...
Some shells may offer advanced support for command typesetting. Such, in bash(1) words of the form $'string' are treated specially:
bash$ grep $'\t' FILE ...
Please note though, while being nice in a command line this may produce compatibility issues when the script will be moved to another platform. Also, be careful with quotes when using the specials, please consult bash(1) for details.
For Bourne shell (and not only) the same behaviour may be emulated using command substitution augmented by printf(1) to construct proper regex:
$ grep "`printf '\t'`" FILE ...

Use echo to insert the tab for you grep "$(echo -e \\t)"

grep "$(printf '\t')" worked for me on Mac OS X

A good choice is to use sed.
sed -n '/\t/p' file
Examples (works in bash, sh, ksh, csh,..):
[~]$ cat testfile
12 3
1 4 abc
xa c
a c\2
1 23
[~]$ sed -n '/\t/p' testfile
xa c
a c\2
[~]$ sed -n '/\ta\t/p' testfile
a c\2
(This answer has been edited following suggestions in comments. Thank you all)

use gawk, set the field delimiter to tab (\t) and check for number of fields. If more than 1, then there is/are tabs
awk -F"\t" 'NF>1' file

+1 way, that works in ksh, dash, etc: use printf to insert TAB:
grep "$(printf 'BEGIN\tEND')" testfile.txt

On ksh I used
grep "[^I]" testfile

The answer is simpler. Write your grep and within the quote type the tab key, it works well at least in ksh
grep " " *

Using the 'sed-as-grep' method, but replacing the tabs with a visible character of personal preference is my favourite method, as it clearly shows both which files contain the requested info, and also where it is placed within lines:
sed -n 's/\t/\*\*\*\*/g' file_name
If you wish to make use of line/file info, or other grep options, but also want to see the visible replacement for the tab character, you can achieve this by
grep -[options] -P '\t' file_name | sed 's/\t/\*\*\*\*/g'
As an example:
$ echo "A\tB\nfoo\tbar" > test
$ grep -inH -P '\t' test | sed 's/\t/\*\*\*\*/g'
test:1:A****B
test:2:foo****bar
EDIT: Obviously the above is only useful for viewing file contents to locate tabs --- if the objective is to handle tabs as part of a larger scripting session, this doesn't serve any useful purpose.

This works well for AIX. I am searching for lines containing JOINED<\t>ACTIVE
voradmin cluster status | grep JOINED$'\t'ACTIVE
vorudb201 1 MEMBER(g) JOINED ACTIVE
*vorucaf01 2 SECONDARY JOINED ACTIVE

You might want to use grep "$(echo -e '\t')"
Only requirement is echo to be capable of interpretation of backslash escapes.

These alternative binary identification methods are totally functional. And, I really like the one's using awk, as I couldn't quite remember the syntaxic use with single binary chars. However, it should also be possible to assign a shell variable a value in a POSIX portable fashion (i.e. TAB=echo "#" | tr "\100" "\011"), and then employ it from there everywhere, in a POSIX portable fashion; as well (i.e grep "$TAB" filename). While this solution works well with TAB, it will also work well other binary chars, when another desired binary value is used in the assignment (instead of the value for the TAB character to 'tr').

The $'\t' notation given in other answers is shell-specific -- it seems to work in bash and zsh but is not universal.
NOTE: The following is for the fish shell and does not work in bash:
In the fish shell, one can use an unquoted \t, for example:
grep \t foo.txt
Or one can use the hex or unicode notations e.g.:
grep \X09 foo.txt
grep \U0009 foo.txt
(these notations are useful for more esoteric characters)
Since these values must be unquoted, one can combine quoted and unquoted values by concatenation:
grep "foo"\t"bar"

You can also use a Perl one-liner instead of grep resp. grep -P:
perl -ne 'print if /\t/' FILENAME

You can type
grep \t foo
or
grep '\t' foo
to search for the tab character in the file foo. You can probably also do other escape codes, though I've only tested \n. Although it's rather time-consuming, and unclear why you would want to, in zsh you can also type the tab character, back to the begin, grep and enclose the tab with quotes.

Look for blank spaces many times [[:space:]]*
grep [[:space:]]*'.''.'
Will find something like this:
'the tab' ..
These are single quotations ('), and not double ("). This is how you make concatenation in grep. =-)

Related

Unix - How to search for exact string in a file

I am trying to search for all files that contain exactly same id as listed in another file and put the file names in another file. I am using below command to find the files.
grep -w -f SearchList.txt INFILES* > matched.txt
The ids are listed in SearchList.txt file
example -
450462134
747837483
352362362
The INFILES files contain data in this format-
0120171116 07:37:45:828501450462134 000001205 0120171116
07:37:45:828501747837483 000001205 0120171116
07:37:45:828501352362362 000001205
The ids which i am looking for are conjoined with other text at the beginning but it has a space at the end.
I tried putting \b at the beginning and end of the search text in SearchList.txt file but i still get incorrect results.
Any leads to right command will be greatly appreciated.
-bash-3.2$ bash --version
GNU bash, version 3.2.25(1)-release (x86_64-redhat-linux-gnu)
-bash-3.2$ grep --version
grep (GNU grep) 2.5.1
The -w option to grep actually inserts \b on both ends of the pattern, you only want it at the end. One option that works is to add \b to the patterns with sed, e.g.:
sed 's/$/\\b/' SearchList.txt
As you are only interested in matching filenames you should use the -l option with grep. Now use this together with grep and process substitution:
grep -lf <(sed 's/$/\\b/' /path/to/SearchList.txt) INFILES*

grep for special characters in Unix

I have a log file (application.log) which might contain the following string of normal & special characters on multiple lines:
*^%Q&$*&^#$&*!^#$*&^&^*&^&
I want to search for the line number(s) which contains this special character string.
grep '*^%Q&$*&^#$&*!^#$*&^&^*&^&' application.log
The above command doesn't return any results.
What would be the correct syntax to get the line numbers?
Tell grep to treat your input as fixed string using -F option.
grep -F '*^%Q&$*&^#$&*!^#$*&^&^*&^&' application.log
Option -n is required to get the line number,
grep -Fn '*^%Q&$*&^#$&*!^#$*&^&^*&^&' application.log
The one that worked for me is:
grep -e '->'
The -e means that the next argument is the pattern, and won't be interpreted as an argument.
From: http://www.linuxquestions.org/questions/programming-9/how-to-grep-for-string-769460/
A related note
To grep for carriage return, namely the \r character, or 0x0d, we can do this:
grep -F $'\r' application.log
Alternatively, use printf, or echo, for POSIX compatibility
grep -F "$(printf '\r')" application.log
And we can use hexdump, or less to see the result:
$ printf "a\rb" | grep -F $'\r' | hexdump -c
0000000 a \r b \n
Regarding the use of $'\r' and other supported characters, see Bash Manual > ANSI-C Quoting:
Words of the form $'string' are treated specially. The word expands to string, with backslash-escaped characters replaced as specified by the ANSI C standard
grep -n "\*\^\%\Q\&\$\&\^\#\$\&\!\^\#\$\&\^\&\^\&\^\&" test.log
1:*^%Q&$&^#$&!^#$&^&^&^&
8:*^%Q&$&^#$&!^#$&^&^&^&
14:*^%Q&$&^#$&!^#$&^&^&^&
You could try removing any alphanumeric characters and space. And then use -n will give you the line number. Try following:
grep -vn "^[a-zA-Z0-9 ]*$" application.log
Try vi with the -b option, this will show special end of line characters
(I typically use it to see windows line endings in a txt file on a unix OS)
But if you want a scripted solution obviously vi wont work so you can try the -f or -e options with grep and pipe the result into sed or awk.
From grep man page:
Matcher Selection
-E, --extended-regexp
Interpret PATTERN as an extended regular expression (ERE, see below). (-E is specified by POSIX.)
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched. (-F is specified
by POSIX.)

Interpret as fixed string/literal and not regex using sed

For grep there's a fixed string option, -F (fgrep) to turn off regex interpretation of the search string.
Is there a similar facility for sed? I couldn't find anything in the man. A recommendation of another gnu/linux tool would also be fine.
I'm using sed for the find and replace functionality: sed -i "s/abc/def/g"
Do you have to use sed? If you're writing a bash script, you can do
#!/bin/bash
pattern='abc'
replace='def'
file=/path/to/file
tmpfile="${TMPDIR:-/tmp}/$( basename "$file" ).$$"
while read -r line
do
echo "${line//$pattern/$replace}"
done < "$file" > "$tmpfile" && mv "$tmpfile" "$file"
With an older Bourne shell (such as ksh88 or POSIX sh), you may not have that cool ${var/pattern/replace} structure, but you do have ${var#pattern} and ${var%pattern}, which can be used to split the string up and then reassemble it. If you need to do that, you're in for a lot more code - but it's really not too bad.
If you're not in a shell script already, you could pretty easily make the pattern, replace, and filename parameters and just call this. :)
PS: The ${TMPDIR:-/tmp} structure uses $TMPDIR if that's set in your environment, or uses /tmp if the variable isn't set. I like to stick the PID of the current process on the end of the filename in the hopes that it'll be slightly more unique. You should probably use mktemp or similar in the "real world", but this is ok for a quick example, and the mktemp binary isn't always available.
Option 1) Escape regexp characters. E.g. sed 's/\$0\.0/0/g' will replace all occurrences of $0.0 with 0.
Option 2) Use perl -p -e in conjunction with quotemeta. E.g. perl -p -e 's/\\./,/gi' will replace all occurrences of . with ,.
You can use option 2 in scripts like this:
SEARCH="C++"
REPLACE="C#"
cat $FILELIST | perl -p -e "s/\\Q$SEARCH\\E/$REPLACE/g" > $NEWLIST
If you're not opposed to Ruby or long lines, you could use this:
alias replace='ruby -e "File.write(ARGV[0], File.read(ARGV[0]).gsub(ARGV[1]) { ARGV[2] })"'
replace test3.txt abc def
This loads the whole file into memory, performs the replacements and saves it back to disk. Should probably not be used for massive files.
If you don't want to escape your string, you can reach your goal in 2 steps:
fgrep the line (getting the line number) you want to replace, and
afterwards use sed for replacing this line.
E.g.
#/bin/sh
PATTERN='foo*[)*abc' # we need it literal
LINENUMBER="$( fgrep -n "$PATTERN" "$FILE" | cut -d':' -f1 )"
NEWSTRING='my new string'
sed -i "${LINENUMBER}s/.*/$NEWSTRING/" "$FILE"
You can do this in two lines of bash code if you're OK with reading the whole file into memory. This is quite flexible -- the pattern and replacement can contain newlines to match across lines if needed. It also preserves any trailing newline or lack thereof, which a simple loop with read does not.
mapfile -d '' < file
printf '%s' "${MAPFILE//"$pat"/"$rep"}" > file
For completeness, if the file can contain null bytes (\0), we need to extend the above, and it becomes
mapfile -d '' < <(cat file; printf '\0')
last=${MAPFILE[-1]}; unset "MAPFILE[-1]"
printf '%s\0' "${MAPFILE[#]//"$pat"/"$rep"}" > file
printf '%s' "${last//"$pat"/"$rep"}" >> file
perl -i.orig -pse 'while (($i = index($_,$s)) >= 0) { substr($_,$i,length($s), $r)}'--\
-s='$_REQUEST['\'old\'']' -r='$_REQUEST['\'new\'']' sample.txt
-i.orig in-place modification with backup.
-p print lines from the input file by default
-s enable rudimentary parsing of command line arguments
-e run this script
index($_,$s) search for the $s string
substr($_,$i,length($s), $r) replace the string
while (($i = index($_,$s)) >= 0) repeat until
-- end of perl parameters
-s='$_REQUEST['\'old\'']', -r='$_REQUEST['\'new\'']' - set $s,$r
You still need to "escape" ' chars but the rest should be straight forward.
Note: this started as an answer to How to pass special character string to sed hence the $_REQUEST['old'] strings, however this question is a bit more appropriately formulated.
You should be using replace instead of sed.
From the man page:
The replace utility program changes strings in place in files or on the
standard input.
Invoke replace in one of the following ways:
shell> replace from to [from to] ... -- file_name [file_name] ...
shell> replace from to [from to] ... < file_name
from represents a string to look for and to represents its replacement.
There can be one or more pairs of strings.

How do I perform a recursive directory search for strings within files in a UNIX TRU64 environment?

Unfortunately, due to the limitations of our Unix Tru64 environment, I am unable to use the GREP -r switch to perform my search for strings within files across multiple directories and sub directories.
Ideally, I would like to pass two parameters. The first will be the directory I want my search is to start on. The second is a file containing a list of all the strings to be searched. This list will consist of various directory path names and will include special characters:
ie:
/aaa/bbb/ccc
/eee/dddd/ggggggg/
etc..
The purpose of this exercise is to identify all shell scripts that may have specific hard coded path names identified in my list.
There was one example I found during my investigations that perhaps comes close, but I am not sure how to customize this to accept a file of string arguments:
eg: find etb -exec grep test {} \;
where 'etb' is the directory and 'test', a hard coded string to be searched.
This should do it:
find dir -type f -exec grep -F -f strings.txt {} \;
dir is the directory from which searching will commence
strings.txt is the file of strings to match, one per line
-F means treat search strings as literal rather than regular expressions
-f strings.txt means use the strings in strings.txt for matching
You can add -l to the grep switches if you just want filenames that match.
Footnote:
Some people prefer a solution involving xargs, e.g.
find dir -type f -print0 | xargs -0 grep -F -f strings.txt
which is perhaps a little more robust/efficient in some cases.
By reading, I assume we can not use the gnu coreutil, and egrep is not available.
I assume (for some reason) the system is broken, and escapes do not work as expected.
Under normal situations, grep -rf patternfile.txt /some/dir/ is the way to go.
a file containing a list of all the strings to be searched
Assumptions : gnu coreutil not available. grep -r does not work. handling of special character is broken.
Now, you have working awk ? no ?. It makes life so much easier. But lets be on the safe side.
Assume : working sed ,one of od OR hexdump OR xxd (from vim package) is available.
Lets call this patternfile.txt
1. Convert list into a regexp that grep likes
Example patternfile.txt contains
/foo/
/bar/doe/
/root/
(example does not print special char, but it's there.) we must turn it into something like
(/foo/|/bar/doe/|/root/)
Assuming echo -en command is not broken, and xxd , or od, or hexdump is available,
Using hexdump
cat patternfile.txt |hexdump -ve '1/1 "%02x \n"' |tr -d '\n'
Using od
cat patternfile.txt |od -A none -t x1|tr -d '\n'
and pipe it into (common for both hexdump and od)
|sed 's:[ ]*0a[ ]*$::g'|sed 's: 0a:\\|:g' |sed 's:^[ ]*::g'|sed 's:^: :g' |sed 's: :\\x:g'
then pipe result into
|sed 's:^:\\(:g' |sed 's:$:\\):g'
and you have a regexp pattern that is escaped.
2. Feed the escaped pattern into broken regexp
Assuming the bare minimum shell escape is available,
we use grep "$(echo -en "ESCAPED_PATTERN" )" to do our job.
3. To sum it up
Building a escaped regexp pattern (using hexdump as example )
grep "$(echo -en "$( cat patternfile.txt |hexdump -ve '1/1 "%02x \n"' |tr -d '\n' |sed 's:[ ]*0a[ ]*$::g'|sed 's: 0a:\\|:g' |sed 's:^[ ]*::g'|sed 's:^: :g' |sed 's: :\\x:g'|sed 's:^:\\(:g' |sed 's:$:\\):g')")"
will escape all characters and enclose it with (|) brackets so a regexp OR match will be performed.
4. Recrusive directory lookup
Under normal situations, even when grep -r is broken, find /dir/ -exec grep {} \; should work.
Some may prefer xargs instaed (unless you happen to have buggy xargs).
We prefer find /somedir/ -type f -print0 |xargs -0 grep -f 'patternfile.txt' approach, but since
this is not available (for whatever valid reason),
we need to exec grep for each file,and this is normaly the wrong way.
But lets do it.
Assume : find -type f works.
Assume : xargs is broken OR not available.
First, if you have a buggy pipe, it might not handle large number of files.
So we avoid xargs in such systems (i know, i know, just lets pretend it is broken ).
find /whatever/dir/to/start/looking/ -type f > list-of-all-file-to-search-for.txt
IF your shell handles large size lists nicely,
for file in cat list-of-all-file-to-search-for.txt ; do grep REGEXP_PATTERN "$file" ;
done ; is a nice way to get by. Unfortunetly, some systems do not like that,
and in that case, you may require
cat list-of-all-file-to-search-for.txt | split --help -a 4 -d -l 2000 file-smaller-chunk.part.
to turn it into smaller chunks. Now this is for a seriously broken system.
then a for file in file-smaller-chunk.part.* ; do for single_line in cat "$file" ; do grep REGEXP_PATTERN "$single_line" ; done ; done ;
should work.
A
cat filelist.txt |while read file ; do grep REGEXP_PATTERN $file ; done ;
may be used as workaround on some systems.
What if my shell doe not handle quotes ?
You may have to escape the file list beforehand.
It can be done much nicer in awk, perl, whatever, but since we restrict our selves to
sed, lets do it.
We assume 0x27, the ' code will actually work.
cat list-of-all-file-to-search-for.txt |sed 's#['\'']#'\''\\'\'\''#g'|sed 's:^:'\'':g'|sed 's:$:'\'':g'
The only time I had to use this was when feeding output into bash again.
What if my shell does not handle that ?
xargs fails , grep -r fails , shell's for loop fails.
Do we have other things ? YES.
Escape all input suitable for your shell, and make a script.
But you know what, I got board, and writing automated scripts for csh just seems
wrong. So I am going to stop here.
Take home note
Use the tool for the right job. Writing a interpreter on bc is perfectly
capable, but it is just plain wrong. Install coreutils, perl, a better grep
what ever. makes life a better thing.

How can I grep for a string that begins with a dash/hyphen?

I want to grep for the string that starts with a dash/hyphen, like -X, in a file, but it's confusing this as a command line argument.
I've tried:
grep "-X"
grep \-X
grep '-X'
Use:
grep -- -X
Documentation
Related: What does a bare double dash mean? (thanks to nutty about natty).
The dash is a special character in Bash as noted at http://tldp.org/LDP/abs/html/special-chars.html#DASHREF. So escaping this once just gets you past Bash, but Grep still has it's own meaning to dashes (by providing options).
So you really need to escape it twice (if you prefer not to use the other mentioned answers). The following will/should work
grep \\-X
grep '\-X'
grep "\-X"
One way to try out how Bash passes arguments to a script/program is to create a .sh script that just echos all the arguments. I use a script called echo-args.sh to play with from time to time, all it contains is:
echo $*
I invoke it as:
bash echo-args.sh \-X
bash echo-args.sh \\-X
bash echo-args.sh "\-X"
You get the idea.
grep -e -X will do the trick.
grep -- -X
grep \\-X
grep '\-X'
grep "\-X"
grep -e -X
grep [-]X
I dont have access to a Solaris machine, but grep "\-X" works for me on linux.
The correct way would be to use "--" to stop processing arguments, as already mentioned. This is due to the usage of getopt_long (GNU C-function from getopt.h) in the source of the tool.
This is why you notice the same phenomena on other command-line tools; since most of them are GNU tools, and use this call,they exhibit the same behavior.
As a side note - getopt_long is what gives us the cool choice between -rlo and --really_long_option and the combination of arguments in the interpreter.
If you're using another utility that passes a single argument to grep, you can use:
'[-]X'
you can use nawk
$ nawk '/-X/{print}' file
None of the answers not helped me (ubuntu 20.04 LTS).
I found a bit another option:
My case:
systemctl --help | grep -w -- --user
-w will match a whole word.
-- means end of command arguments (to mark -w as not part of the grep command)
ls -l | grep "^-"
Hope this one would serve your purpose.
grep "^-X" file
It will grep and pick all the lines form the file.
^ in the grep"^" indicates a line starting with

Resources