Endianness in Unix hexdump - unix

The following *nix command pipes a hex representation of an IP and port (127.0.0.1:80) into the hexdump command.
printf "\x7F\x00\x00\x01\x00\x50" | hexdump -e '3/1 "%u." /1 "%u:" 1/2 "%u" "\n"'
The -e flag allows an arbitrary format to parse the input. In this case, we are parsing the first three octets of the IP into unsigned decimals followed by a dot. The final octet is also parsed into an unsigned decimal but it is followed by a colon. Finally -- and this is where the problem lies -- the 2 bytes for the port are parsed as a single unsigned decimal followed by a newline.
Depending on the endianness of the system executing this command, the result will differ. A big-endian system will properly show port 80; whereas a little-endian system will show port 20480.
Is there any way to manipulate hexdump to be aware of endianness while still allowing the arbitrary format specification via -e?

I don't know that it can be done with hexdump, but it's easy enough
in perl:
$ printf '\x00\x50' | perl -nE 'say unpack "S>"'
80
$ printf '\x00\x50' | perl -nE 'say unpack "S<"'
20480
You can tweak that to get the format you desire. ('say'
requires perl 5.10. Use print for perl < 5.10)
(To clarify for the person who wishes to downvote because I didn't
"answer the question". I'm suggesting that the OP replace
hexdump with perl. Downvote if you must.)

Related

How to replace a single byte in a binary file using awk

I am trying to reverse engineer a Linux box to flash my own firmware, but to do that I need to patch a binary file. The patch is actually quite simple, I just need to alter one byte at a known offset. However, the Linux box doesn't have any programs like dd, sed, awk, etc. Not even telnet (I am communicating with it over serial). However, it does have sh. Is there a way to replace a byte at a known offset just using shell commands?
Thank you.
If you have head -c (the -c option is not specified by posix), printf and tail available, then this should work:
file=pathToYourFile
address=1234 # `address=1` changes the first byte
newByteOctal=567
{
head -c "$((address-1))" "$file"
printf "\\0$newByteOctal"
tail -c "+$((address+1))" "$file"
} > patchedFiled
If you don't have head -c, but the file is very small and does not (!) contain any null byte before the offset, then you can replace head -c with
printf "%.$((offset1based-1))s" "$(< "$file")"

Transform hexadecimal representation to unicode

I'm dealing with very big files (~10Gb) containing word with ascii representation of unicode :
Nuray \u00d6zdemir
Erol \u010colakovi\u0107 \u0160ehi\u0107
I want to tranform them into unicode before inserting them into a database, like this :
Nuray Özdemir
Erol Čolaković Šehić
I've seen how to do it with vim but it's very slow for very large file. I thought copy/paste of the regex would be OK but it's not.
I actually get things like this:
$ echo "Nuray \u00d6zdemir" | sed -E 's/\\\u(.)(.)(.)(.)/\x\1\x\2\x\3\x\4/g'
Nuray x0x0xdx6zdemir
How can I concatenate the \x and the value of \1 \2...?
I don't want to use echo or an external program due to the size of the file, I want something efficient.
Assuming the unicodes in your file are within BMP (16bit), how about:
perl -pe 'BEGIN {binmode(STDOUT, ":utf8")} s/\\u([0-9a-fA-F]{4})/chr(hex($1))/ge' input_file > output_file
Output:
Nuray Özdemir
Erol Čolaković Šehić
I have generated a 6Gb file to test the speed efficiency.
It took approx. 10 minutes to process the entire file on my 6 year old laptop.
I hope it will be acceptable to you.
I am not a mongoDB expert at all but what I can tell you is the following:
If there is a way to do it at the import directly within the DB engine, this solution should be used, now if this feature is not available.
You can use either use a naive approach to solve it:
while read -r line; do echo -e "$line"; done < input_file
INPUT:
cat input_file
Nuray \u00d6zdemir
Erol \u010colakovi\u0107 \u0160ehi\u0107
OUTPUT:
Nuray Özdemir
Erol Čolaković Šehić
But as you have spotted yourself the call to echo -e at each line will create a resource intensive change of context (generate a sub-process for echo -> memory allocation, new entry in the processes table, priority management, switching back to the parent process) that is not efficient for 10GB files.
Or go for a smarter approach using tools that should be available in your distro example:
whatis ascii2uni
ascii2uni (1) - convert 7-bit ASCII representations to UTF-8 Unicode
Command:
ascii2uni -a U -q input_file
Nuray Özdemir
Erol Čolaković ᘎhić
You can also split (ex split command) the input file in pieces, run in parallel the conversion step on each sub file, and import each converted pieces as soon as it is available to shorten the total execution time.

Check if a file contains certain ASCII characters

I need a unix command to verify the file has ASCII printable characters only (between ASCII Hex 20 and 7E inclusive).
I got below command to check if file contains non-ASCII characters, but cannot figure out my above question.
if LC_ALL=C grep -q '[^[:print:][:space:]]' file; then
echo "file contains non-ascii characters"
else
echo "file contains ascii characters only"
fi
nice to have:
- Stop loading results. Sometimes one is enough
To find 20 to 7E characters in a file you can use:
grep -P "[\x20-\x7E]" file
Note the usage of -P to perform Perl regular expressions.
But in this case you want to check if the file just contains these kind of characters. So the best thing to do is to check if there is any of them that are not within this range, that is check [^range]:
grep -P "[^\x20-\x7E]" file
All together, I would say:
grep -qP "[^\x20-\x7E]" file && echo "weird ASCII" || echo "clean one"
This can be done in unix using the POSIX grep options:
if LC_ALL=C grep -q '[^ -~]' file; then
echo "file contains non-ascii characters"
else
echo "file contains ascii characters only"
fi
where the characters in [ ... ] are ^ (caret), space, - (ASCII minus sign), ~ (tilde).
You could also specify ASCII tab. The standard refers to these as collating elements. It seems that both \x (hexadecimal) or \0 (octal) are shown in the standard description of bracket expressions (see 7.4.1). So you could use \x09 or \011 for the literal tab.
According to the description, by default -e accepts a basic regular expression (BRE). If you added a -E, you could have an extended regular expression (but that is not needed).

Using grep to find a binary pattern in a file

Previously, I was able to find binary patterns in files using grep with
grep -a -b -o -P '\x01\x02\x03' <file>
By find I mean I was able to get the byte position of the pattern in the file. But when I tried doing this with the latest version of grep (v2.16) it no longer worked.
Specifically, I can manually verify that the pattern is present in the file but grep does not find it. Strangely, some patterns are found correctly but not others. For example, in a test file
000102030405060708090a0b0c0e0f
'\x01\x02' is found but not '\x07\x08'.
Any help in clarifying this behavior is highly appreciated.
Update: The above example does not show the described behavior. Here are the commands that exhibit the problem
printf `for (( x=0; x<256; x++ )); do printf "\x5cx%02x" $x; done` > test
for (( x=$((0x70)); x<$((0x8f)); x++ )); do
p=`printf "\'\x5cx%02x\x5cx%02x\'" $x $((x+1))`
echo -n $p
echo $p test | xargs grep -c -a -o -b -P | cut -d: -f1
done
The first line creates a file with all possible bytes from 0x00 to 0xff in a sequence. The second line counts the number of occurrences of pairs of consecutive byte values in the range 0x70 to 0x8f. The output I get is
'\x70\x71'1
'\x71\x72'1
'\x72\x73'1
'\x73\x74'1
'\x74\x75'1
'\x75\x76'1
'\x76\x77'1
'\x77\x78'1
'\x78\x79'1
'\x79\x7a'1
'\x7a\x7b'1
'\x7b\x7c'1
'\x7c\x7d'1
'\x7d\x7e'1
'\x7e\x7f'1
'\x7f\x80'0
'\x80\x81'0
'\x81\x82'0
'\x82\x83'0
'\x83\x84'0
'\x84\x85'0
'\x85\x86'0
'\x86\x87'0
'\x87\x88'0
'\x88\x89'0
'\x89\x8a'0
'\x8a\x8b'0
'\x8b\x8c'0
'\x8c\x8d'0
'\x8d\x8e'0
'\x8e\x8f'0
Update: The same pattern occurs for single-byte patterns -- no bytes with value greater than 0x7f are found.
The results may depend on you current locale. To avoid this, use:
env LANG=LC_ALL grep -P "<binary pattern>" <file>
where env LANG=LC_ALL overrides your current locale to allow byte matching. Otherwise, patterns with non-ASCII "characters" such as \xff will not match.
For example, this fails to match because (at least in my case) the environment has LANG=en_US.UTF-8:
$ printf '\x41\xfe\n' | grep -P '\xfe'
when this succeeds:
$ printf '\x41\xfe\n' | env LANG=LC_ALL grep -P '\xfe'
A?

Generate a random filename in unix shell

I would like to generate a random filename in unix shell (say tcshell). The filename should consist of random 32 hex letters, e.g.:
c7fdfc8f409c548a10a0a89a791417c5
(to which I will add whatever is neccesary). The point is being able to do it only in shell without resorting to a program.
Assuming you are on a linux, the following should work:
cat /dev/urandom | tr -cd 'a-f0-9' | head -c 32
This is only pseudo-random if your system runs low on entropy, but is (on linux) guaranteed to terminate. If you require genuinely random data, cat /dev/random instead of /dev/urandom. This change will make your code block until enough entropy is available to produce truly random output, so it might slow down your code. For most uses, the output of /dev/urandom is sufficiently random.
If you on OS X or another BSD, you need to modify it to the following:
cat /dev/urandom | env LC_CTYPE=C tr -cd 'a-f0-9' | head -c 32
why do not use unix mktemp command:
$ TMPFILE=`mktemp tmp.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX` && echo $TMPFILE
tmp.MnxEsPDsNUjrzDIiPhnWZKmlAXAO8983
One command, no pipe, no loop:
hexdump -n 16 -v -e '/1 "%02X"' -e '/16 "\n"' /dev/urandom
If you don't need the newline, for example when you're using it in a variable:
hexdump -n 16 -v -e '/1 "%02X"' /dev/urandom
Using "16" generates 32 hex digits.
uuidgen generates exactly this, except you have to remove hyphens. So I found this to be the most elegant (at least to me) way of achieving this. It should work on linux and OS X out of the box.
uuidgen | tr -d '-'
As you probably noticed from each of the answers, you generally have to "resort to a program".
However, without using any external executables, in Bash and ksh:
string=''; for i in {0..31}; do string+=$(printf "%x" $(($RANDOM%16)) ); done; echo $string
in zsh:
string=''; for i in {0..31}; do string+=$(printf "%x" $(($RANDOM%16)) ); dummy=$RANDOM; done; echo $string
Change the lower case x in the format string to an upper case X to make the alphabetic hex characters upper case.
Here's another way to do it in Bash but without an explicit loop:
printf -v string '%X' $(printf '%.2s ' $((RANDOM%16))' '{00..31})
In the following, "first" and "second" printf refers to the order in which they're executed rather than the order in which they appear in the line.
This technique uses brace expansion to produce a list of 32 random numbers mod 16 each followed by a space and one of the numbers in the range in braces followed by another space (e.g. 11 00). For each element of that list, the first printf strips off all but the first two characters using its format string (%.2) leaving either single digits followed by a space each or two digits. The space in the format string ensures that there is then at least one space between each output number.
The command substitution containing the first printf is not quoted so that word splitting is performed and each number goes to the second printf as a separate argument. There, the numbers are converted to hex by the %X format string and they are appended to each other without spaces (since there aren't any in the format string) and the result is stored in the variable named string.
When printf receives more arguments than its format string accounts for, the format is applied to each argument in turn until they are all consumed. If there are fewer arguments, the unmatched format string (portion) is ignored, but that doesn't apply in this case.
I tested it in Bash 3.2, 4.4 and 5.0-alpha. But it doesn't work in zsh (5.2) or ksh (93u+) because RANDOM only gets evaluated once in the brace expansion in those shells.
Note that because of using the mod operator on a value that ranges from 0 to 32767 the distribution of digits using the snippets could be skewed (not to mention the fact that the numbers are pseudo random in the first place). However, since we're using mod 16 and 32768 is divisible by 16, that won't be a problem here.
In any case, the correct way to do this is using mktemp as in Oleg Razgulyaev's answer.
Tested in zsh, should work with any BASH compatible shell!
#!/bin/zsh
SUM=`md5sum <<EOF
$RANDOM
EOF`
FN=`echo $SUM | awk '// { print $1 }'`
echo "Your new filename: $FN"
Example:
$ zsh ranhash.sh
Your new filename: 2485938240bf200c26bb356bbbb0fa32
$ zsh ranhash.sh
Your new filename: ad25cb21bea35eba879bf3fc12581cc9
Yet another way[tm].
R=$(echo $RANDOM $RANDOM $RANDOM $RANDOM $RANDOM | md5 | cut -c -8)
FILENAME="abcdef-$R"
This answer is very similar to fmarks, so I cannot really take credit for it, but I found the cat and tr command combinations quite slow, and I found this version quite a bit faster. You need hexdump.
hexdump -e '/1 "%02x"' -n32 < /dev/urandom
Another thing you can add is running the date command as follows:
date +%S%N
Reads nonosecond time and the result adds a lot of randomness.
The first answer is good but why fork cat if not required.
tr -dc 'a-f0-9' < /dev/urandom | head -c32
Grab 16 bytes from /dev/random, convert them to hex, take the first line, remove the address, remove the spaces.
head /dev/random -c16 | od -tx1 -w16 | head -n1 | cut -d' ' -f2- | tr -d ' '
Assuming that "without resorting to a program" means "using only programs that are readily available", of course.
If you have openssl in your system you can use it for generating random hex (also it can be -base64) strings with defined length. I found it pretty simple and usable in cron in one line jobs.
openssl rand -hex 32
8c5a7515837d7f0b19e7e6fa4c448400e70ffec88ecd811a3dce3272947cb452
Hope to add a (maybe) better solution to this topic.
Notice: this only works with bash4 and some implement of mktemp(for example, the GNU one)
Try this
fn=$(mktemp -u -t 'XXXXXX')
echo ${fn/\/tmp\//}
This one is twice as faster as head /dev/urandom | tr -cd 'a-f0-9' | head -c 32, and eight times as faster as cat /dev/urandom | tr -cd 'a-f0-9' | head -c 32.
Benchmark:
With mktemp:
#!/bin/bash
# a.sh
for (( i = 0; i < 1000; i++ ))
do
fn=$(mktemp -u -t 'XXXXXX')
echo ${fn/\/tmp\//} > /dev/null
done
time ./a.sh
./a.sh 0.36s user 1.97s system 99% cpu 2.333 total
And the other:
#!/bin/bash
# b.sh
for (( i = 0; i < 1000; i++ ))
do
cat /dev/urandom | tr -dc 'a-zA-Z0-9' | head -c 32 > /dev/null
done
time ./b.sh
./b.sh 0.52s user 20.61s system 113% cpu 18.653 total
If you are on Linux, then Python will come pre-installed. So you can go for something similar to the below:
python -c "import uuid; print str(uuid.uuid1())"
If you don't like the dashes, then use replace function as shown below
python -c "import uuid; print str(uuid.uuid1()).replace('-','')"

Resources