Using grep to find a binary pattern in a file

Using grep to find a binary pattern in a file - unix

Previously, I was able to find binary patterns in files using grep with
grep -a -b -o -P '\x01\x02\x03' <file>
By find I mean I was able to get the byte position of the pattern in the file. But when I tried doing this with the latest version of grep (v2.16) it no longer worked.
Specifically, I can manually verify that the pattern is present in the file but grep does not find it. Strangely, some patterns are found correctly but not others. For example, in a test file
000102030405060708090a0b0c0e0f
'\x01\x02' is found but not '\x07\x08'.
Any help in clarifying this behavior is highly appreciated.
Update: The above example does not show the described behavior. Here are the commands that exhibit the problem
printf `for (( x=0; x<256; x++ )); do printf "\x5cx%02x" $x; done` > test
for (( x=$((0x70)); x<$((0x8f)); x++ )); do
p=`printf "\'\x5cx%02x\x5cx%02x\'" $x $((x+1))`
echo -n $p
echo $p test | xargs grep -c -a -o -b -P | cut -d: -f1
done
The first line creates a file with all possible bytes from 0x00 to 0xff in a sequence. The second line counts the number of occurrences of pairs of consecutive byte values in the range 0x70 to 0x8f. The output I get is
'\x70\x71'1
'\x71\x72'1
'\x72\x73'1
'\x73\x74'1
'\x74\x75'1
'\x75\x76'1
'\x76\x77'1
'\x77\x78'1
'\x78\x79'1
'\x79\x7a'1
'\x7a\x7b'1
'\x7b\x7c'1
'\x7c\x7d'1
'\x7d\x7e'1
'\x7e\x7f'1
'\x7f\x80'0
'\x80\x81'0
'\x81\x82'0
'\x82\x83'0
'\x83\x84'0
'\x84\x85'0
'\x85\x86'0
'\x86\x87'0
'\x87\x88'0
'\x88\x89'0
'\x89\x8a'0
'\x8a\x8b'0
'\x8b\x8c'0
'\x8c\x8d'0
'\x8d\x8e'0
'\x8e\x8f'0
Update: The same pattern occurs for single-byte patterns -- no bytes with value greater than 0x7f are found.

The results may depend on you current locale. To avoid this, use:
env LANG=LC_ALL grep -P "<binary pattern>" <file>
where env LANG=LC_ALL overrides your current locale to allow byte matching. Otherwise, patterns with non-ASCII "characters" such as \xff will not match.
For example, this fails to match because (at least in my case) the environment has LANG=en_US.UTF-8:
$ printf '\x41\xfe\n' | grep -P '\xfe'
when this succeeds:
$ printf '\x41\xfe\n' | env LANG=LC_ALL grep -P '\xfe'
A?

Related

What is the easiest way for grepping the 'man grep' for flags

I do use grep a lot, but I would love to improve a bit.
Regarding the question. I wanted to narrow the man entry to find the explanation of what the -v in grep -v 'pattern' filename stood for, mainly this:
-v, --invert-match
Selected lines are those not matching any of the specified patterns.
Thus, to find the next five lines after the line which contains -v I tried:
man grep | grep -A 5 -v
and
man grep | grep -A 5 '-v'
but they return:
usage: grep [-abcDEFGHhIiJLlmnOoqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
[-e pattern] [-f file] [--binary-files=value] [--color=when]
[--context[=num]] [--directories=action] [--label] [--line-buffered]
[--null] [pattern] [file ...]
This confuses me since:
man grep | grep -A 5 'Selected'
and
man grep | grep -A 5 Selected
do work.
What is wrong in my approach? Is there any easier way to achieve what I need?

One approach is to parse the Info documents for the command directly. If you run info grep (or other command) you will often find much more detailed and better-structured documentation, which will let you pin-point just the section you need.
Here's a function that will print out the relevant Info section for an option/variable/etc:
info_search() {
info --subnodes "$1" -o - 2>&- \
| awk -v RS='' "/(^|\n)(‘|'|\`)$2((,|\[| ).*)?(’|')\n/"
}
This should work on Linux/macOS/BSD. Output is like:
$ info_search grep -v
‘-v’
‘--invert-match’
Invert the sense of matching, to select non-matching lines. (‘-v’
is specified by POSIX.)
$ info_search gawk RS
'RS == "\n"'
Records are separated by the newline character ('\n'). In effect,
every line in the data file is a separate record, including blank
...
$ info_search bash -i
`-i'
Force the shell to run interactively. Interactive shells are
...

strange echo output

Can anybody explain this behaviour of the bash shell which is driving me nuts
[root#ns1 bin]# export test=`whois -h whois.lacnic.net 187.14.6.108 | grep -i inetnum: | awk '{print $2}'`
[root#ns1 bin]# echo $test
187.12/14
[root#ns1 bin]# echo "iptables -I INPUT -s $test -J DROP"
-J DROP -I INPUT -s 187.12/14
[root#ns1 bin]#
Why is my echo screwed up? It is being changed by the contents of $test.
If you change $test to "ABC" all is fine. Is it related to the slash?

Why is my echo screwed up? It is being changed by the contents of
$test.
Because your test contains a carriage return. Remove it:
test=$(whois -h whois.lacnic.net 187.14.6.108 | grep -i inetnum: | awk '{print $2}' | tr -d '\r')

Your test contains something like
1234567 -I INPUT -s 187.12/14\r-J DROP
which, due to the carriage return, is visible only as
-J DROP -I INPUT -s 187.12/14
The CR moves the cursor to the start-of-line, where it then overwrites previous characters.
You could try
echo "$test" | od -bc
to verify this.

This is almost certainly a carriage return. echo is doing its job correctly and emitting the string to your terminal; the problem is that your terminal is treating a part of the string as a command for it to follow (specifically, a LF character, $'\r', telling it to send the cursor to the beginning of the existing line).
If you want to see the contents of $test in a way which doesn't allow your terminal to interpret escape sequences or other control characters, run the following (note that the %q format string is a bash extension, not available in pure-POSIX systems):
printf '%q\n' "$test"
This will show you the precise contents formatted and escaped for use by the shell, which will be illuminative as to why they are problematic.
To remove $'\r', which is almost certainly the character giving you trouble, you can run the following parameter expansion:
test=${test//$'\r'/}
Unlike solutions requiring piping launching an extra process (such as tr), this happens inside the already-running bash shell, and is thus more efficient.

How to copy files in shell that do not end with a certain file extension

For example copy all files that do not end with .txt

Bash will accept a not pattern.
cp !(*.txt)

You can use ls with grep -v option:
for i in `ls | grep -v ".txt"`
do
cp $i $dest_dir
done

Depending on how many assumptions you can afford to make about the characters in the file names, it might be as simple as:
cp $(ls | grep -v '\.txt$') /some/other/place
If that won't work for you, then maybe find ... -print0 | xargs -0 cp ... can be used instead (though that has issues - because the destination goes at the end of the argument list).
On MacOS X, xargs has an option -J that supports what is needed:
-J replstr
If this option is specified, xargs will use the data read from standard input to replace the first occurrence of replstr instead of append-
ing that data after all other arguments. This option will not affect how many arguments will be read from input (-n), or the size of the
command(s) xargs will generate (-s). The option just moves where those arguments will be placed in the command(s) that are executed. The
replstr must show up as a distinct argument to xargs. It will not be recognized if, for instance, it is in the middle of a quoted string.
Furthermore, only the first occurrence of the replstr will be replaced. For example, the following command will copy the list of files and
directories which start with an uppercase letter in the current directory to destdir:
/bin/ls -1d [A-Z]* | xargs -J % cp -rp % destdir
It appears the GNU xargs does not have -J but does have the related but slightly restrictive -I option (which is also present in MacOS X):
-I replace-str
Replace occurrences of replace-str in the initial-arguments with
names read from standard input. Also, unquoted blanks do not
terminate input items; instead the separator is the newline
character. Implies -x and -L 1.

You can rely on:
find . -not -name "*.txt"
By using:
find -x . -not -name "*.txt" -d 1 -exec cp '{}' toto/ \;`
Which copies all file that are not .txt of the current directory to a subdirectory toto/. the -d 1 is used to prevent recursion here.

Either do:
for f in $(ls | grep -v "\.txt$")
do
cp -- "$f" ⟨destination-directory⟩
done
or if you have a huge amount of files:
find -prune \! -name "*.txt" -exec cp -- "{}" ⟨destination-directory⟩ .. \;
Two things here to comment on. One is the use of the double hyphen in the invocation of cp, and the quoting of $f. The first guards against "wacky" filenames that begin with a hyphen and might be interpreted as options. The second guards agains filenames with spaces (or what's in IFS) in them.

In zsh:
setopt extendedglob
cp *^.txt /some/folder
(if you just want files)...
cp *.^txt(.) /some/folder
More information on zsh globbing here and here.

I would do it like this, where destination is the destination directory:
ls | grep -v "\.txt$" | xargs cp -t destination
Edit: added "-t" thanks to the comments

How do I perform a recursive directory search for strings within files in a UNIX TRU64 environment?

Unfortunately, due to the limitations of our Unix Tru64 environment, I am unable to use the GREP -r switch to perform my search for strings within files across multiple directories and sub directories.
Ideally, I would like to pass two parameters. The first will be the directory I want my search is to start on. The second is a file containing a list of all the strings to be searched. This list will consist of various directory path names and will include special characters:
ie:
/aaa/bbb/ccc
/eee/dddd/ggggggg/
etc..
The purpose of this exercise is to identify all shell scripts that may have specific hard coded path names identified in my list.
There was one example I found during my investigations that perhaps comes close, but I am not sure how to customize this to accept a file of string arguments:
eg: find etb -exec grep test {} \;
where 'etb' is the directory and 'test', a hard coded string to be searched.

This should do it:
find dir -type f -exec grep -F -f strings.txt {} \;
dir is the directory from which searching will commence
strings.txt is the file of strings to match, one per line
-F means treat search strings as literal rather than regular expressions
-f strings.txt means use the strings in strings.txt for matching
You can add -l to the grep switches if you just want filenames that match.
Footnote:
Some people prefer a solution involving xargs, e.g.
find dir -type f -print0 | xargs -0 grep -F -f strings.txt
which is perhaps a little more robust/efficient in some cases.

By reading, I assume we can not use the gnu coreutil, and egrep is not available.
I assume (for some reason) the system is broken, and escapes do not work as expected.
Under normal situations, grep -rf patternfile.txt /some/dir/ is the way to go.
a file containing a list of all the strings to be searched
Assumptions : gnu coreutil not available. grep -r does not work. handling of special character is broken.
Now, you have working awk ? no ?. It makes life so much easier. But lets be on the safe side.
Assume : working sed ,one of od OR hexdump OR xxd (from vim package) is available.
Lets call this patternfile.txt
1. Convert list into a regexp that grep likes
Example patternfile.txt contains
/foo/
/bar/doe/
/root/
(example does not print special char, but it's there.) we must turn it into something like
(/foo/|/bar/doe/|/root/)
Assuming echo -en command is not broken, and xxd , or od, or hexdump is available,
Using hexdump
cat patternfile.txt |hexdump -ve '1/1 "%02x \n"' |tr -d '\n'
Using od
cat patternfile.txt |od -A none -t x1|tr -d '\n'
and pipe it into (common for both hexdump and od)
|sed 's:[ ]*0a[ ]*$::g'|sed 's: 0a:\\|:g' |sed 's:^[ ]*::g'|sed 's:^: :g' |sed 's: :\\x:g'
then pipe result into
|sed 's:^:\\(:g' |sed 's:$:\\):g'
and you have a regexp pattern that is escaped.
2. Feed the escaped pattern into broken regexp
Assuming the bare minimum shell escape is available,
we use grep "$(echo -en "ESCAPED_PATTERN" )" to do our job.
3. To sum it up
Building a escaped regexp pattern (using hexdump as example )
grep "$(echo -en "$( cat patternfile.txt |hexdump -ve '1/1 "%02x \n"' |tr -d '\n' |sed 's:[ ]*0a[ ]*$::g'|sed 's: 0a:\\|:g' |sed 's:^[ ]*::g'|sed 's:^: :g' |sed 's: :\\x:g'|sed 's:^:\\(:g' |sed 's:$:\\):g')")"
will escape all characters and enclose it with (|) brackets so a regexp OR match will be performed.
4. Recrusive directory lookup
Under normal situations, even when grep -r is broken, find /dir/ -exec grep {} \; should work.
Some may prefer xargs instaed (unless you happen to have buggy xargs).
We prefer find /somedir/ -type f -print0 |xargs -0 grep -f 'patternfile.txt' approach, but since
this is not available (for whatever valid reason),
we need to exec grep for each file,and this is normaly the wrong way.
But lets do it.
Assume : find -type f works.
Assume : xargs is broken OR not available.
First, if you have a buggy pipe, it might not handle large number of files.
So we avoid xargs in such systems (i know, i know, just lets pretend it is broken ).
find /whatever/dir/to/start/looking/ -type f > list-of-all-file-to-search-for.txt
IF your shell handles large size lists nicely,
for file in cat list-of-all-file-to-search-for.txt ; do grep REGEXP_PATTERN "$file" ;
done ; is a nice way to get by. Unfortunetly, some systems do not like that,
and in that case, you may require
cat list-of-all-file-to-search-for.txt | split --help -a 4 -d -l 2000 file-smaller-chunk.part.
to turn it into smaller chunks. Now this is for a seriously broken system.
then a for file in file-smaller-chunk.part.* ; do for single_line in cat "$file" ; do grep REGEXP_PATTERN "$single_line" ; done ; done ;
should work.
A
cat filelist.txt |while read file ; do grep REGEXP_PATTERN $file ; done ;
may be used as workaround on some systems.
What if my shell doe not handle quotes ?
You may have to escape the file list beforehand.
It can be done much nicer in awk, perl, whatever, but since we restrict our selves to
sed, lets do it.
We assume 0x27, the ' code will actually work.
cat list-of-all-file-to-search-for.txt |sed 's#['\'']#'\''\\'\'\''#g'|sed 's:^:'\'':g'|sed 's:$:'\'':g'
The only time I had to use this was when feeding output into bash again.
What if my shell does not handle that ?
xargs fails , grep -r fails , shell's for loop fails.
Do we have other things ? YES.
Escape all input suitable for your shell, and make a script.
But you know what, I got board, and writing automated scripts for csh just seems
wrong. So I am going to stop here.
Take home note
Use the tool for the right job. Writing a interpreter on bc is perfectly
capable, but it is just plain wrong. Install coreutils, perl, a better grep
what ever. makes life a better thing.

Generate a random filename in unix shell

I would like to generate a random filename in unix shell (say tcshell). The filename should consist of random 32 hex letters, e.g.:
c7fdfc8f409c548a10a0a89a791417c5
(to which I will add whatever is neccesary). The point is being able to do it only in shell without resorting to a program.

Assuming you are on a linux, the following should work:
cat /dev/urandom | tr -cd 'a-f0-9' | head -c 32
This is only pseudo-random if your system runs low on entropy, but is (on linux) guaranteed to terminate. If you require genuinely random data, cat /dev/random instead of /dev/urandom. This change will make your code block until enough entropy is available to produce truly random output, so it might slow down your code. For most uses, the output of /dev/urandom is sufficiently random.
If you on OS X or another BSD, you need to modify it to the following:
cat /dev/urandom | env LC_CTYPE=C tr -cd 'a-f0-9' | head -c 32

why do not use unix mktemp command:
$ TMPFILE=`mktemp tmp.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX` && echo $TMPFILE
tmp.MnxEsPDsNUjrzDIiPhnWZKmlAXAO8983

One command, no pipe, no loop:
hexdump -n 16 -v -e '/1 "%02X"' -e '/16 "\n"' /dev/urandom
If you don't need the newline, for example when you're using it in a variable:
hexdump -n 16 -v -e '/1 "%02X"' /dev/urandom
Using "16" generates 32 hex digits.

uuidgen generates exactly this, except you have to remove hyphens. So I found this to be the most elegant (at least to me) way of achieving this. It should work on linux and OS X out of the box.
uuidgen | tr -d '-'

As you probably noticed from each of the answers, you generally have to "resort to a program".
However, without using any external executables, in Bash and ksh:
string=''; for i in {0..31}; do string+=$(printf "%x" $(($RANDOM%16)) ); done; echo $string
in zsh:
string=''; for i in {0..31}; do string+=$(printf "%x" $(($RANDOM%16)) ); dummy=$RANDOM; done; echo $string
Change the lower case x in the format string to an upper case X to make the alphabetic hex characters upper case.
Here's another way to do it in Bash but without an explicit loop:
printf -v string '%X' $(printf '%.2s ' $((RANDOM%16))' '{00..31})
In the following, "first" and "second" printf refers to the order in which they're executed rather than the order in which they appear in the line.
This technique uses brace expansion to produce a list of 32 random numbers mod 16 each followed by a space and one of the numbers in the range in braces followed by another space (e.g. 11 00). For each element of that list, the first printf strips off all but the first two characters using its format string (%.2) leaving either single digits followed by a space each or two digits. The space in the format string ensures that there is then at least one space between each output number.
The command substitution containing the first printf is not quoted so that word splitting is performed and each number goes to the second printf as a separate argument. There, the numbers are converted to hex by the %X format string and they are appended to each other without spaces (since there aren't any in the format string) and the result is stored in the variable named string.
When printf receives more arguments than its format string accounts for, the format is applied to each argument in turn until they are all consumed. If there are fewer arguments, the unmatched format string (portion) is ignored, but that doesn't apply in this case.
I tested it in Bash 3.2, 4.4 and 5.0-alpha. But it doesn't work in zsh (5.2) or ksh (93u+) because RANDOM only gets evaluated once in the brace expansion in those shells.
Note that because of using the mod operator on a value that ranges from 0 to 32767 the distribution of digits using the snippets could be skewed (not to mention the fact that the numbers are pseudo random in the first place). However, since we're using mod 16 and 32768 is divisible by 16, that won't be a problem here.
In any case, the correct way to do this is using mktemp as in Oleg Razgulyaev's answer.

Tested in zsh, should work with any BASH compatible shell!
#!/bin/zsh
SUM=`md5sum <<EOF
$RANDOM
EOF`
FN=`echo $SUM | awk '// { print $1 }'`
echo "Your new filename: $FN"
Example:
$ zsh ranhash.sh
Your new filename: 2485938240bf200c26bb356bbbb0fa32
$ zsh ranhash.sh
Your new filename: ad25cb21bea35eba879bf3fc12581cc9

Yet another way[tm].
R=$(echo $RANDOM $RANDOM $RANDOM $RANDOM $RANDOM | md5 | cut -c -8)
FILENAME="abcdef-$R"

This answer is very similar to fmarks, so I cannot really take credit for it, but I found the cat and tr command combinations quite slow, and I found this version quite a bit faster. You need hexdump.
hexdump -e '/1 "%02x"' -n32 < /dev/urandom

Another thing you can add is running the date command as follows:
date +%S%N
Reads nonosecond time and the result adds a lot of randomness.

The first answer is good but why fork cat if not required.
tr -dc 'a-f0-9' < /dev/urandom | head -c32

Grab 16 bytes from /dev/random, convert them to hex, take the first line, remove the address, remove the spaces.
head /dev/random -c16 | od -tx1 -w16 | head -n1 | cut -d' ' -f2- | tr -d ' '
Assuming that "without resorting to a program" means "using only programs that are readily available", of course.

If you have openssl in your system you can use it for generating random hex (also it can be -base64) strings with defined length. I found it pretty simple and usable in cron in one line jobs.
openssl rand -hex 32
8c5a7515837d7f0b19e7e6fa4c448400e70ffec88ecd811a3dce3272947cb452

Hope to add a (maybe) better solution to this topic.
Notice: this only works with bash4 and some implement of mktemp(for example, the GNU one)
Try this
fn=$(mktemp -u -t 'XXXXXX')
echo ${fn/\/tmp\//}
This one is twice as faster as head /dev/urandom | tr -cd 'a-f0-9' | head -c 32, and eight times as faster as cat /dev/urandom | tr -cd 'a-f0-9' | head -c 32.
Benchmark:
With mktemp:
#!/bin/bash
# a.sh
for (( i = 0; i < 1000; i++ ))
do
fn=$(mktemp -u -t 'XXXXXX')
echo ${fn/\/tmp\//} > /dev/null
done
time ./a.sh
./a.sh 0.36s user 1.97s system 99% cpu 2.333 total
And the other:
#!/bin/bash
# b.sh
for (( i = 0; i < 1000; i++ ))
do
cat /dev/urandom | tr -dc 'a-zA-Z0-9' | head -c 32 > /dev/null
done
time ./b.sh
./b.sh 0.52s user 20.61s system 113% cpu 18.653 total

If you are on Linux, then Python will come pre-installed. So you can go for something similar to the below:
python -c "import uuid; print str(uuid.uuid1())"
If you don't like the dashes, then use replace function as shown below
python -c "import uuid; print str(uuid.uuid1()).replace('-','')"

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex