Grep hex characters in a file - unix

I am having some difficulty finding the number of hex characters in a file. For example:
grep -o \x02 file | wc -l
0
There should be about 3M matches here, but it doesn't seem like the \x02 character is being recognized here. For example (in python):
>>> s=open('file').read()
>>> s.count('\x02')
2932267

The answer by Mark Setchell may be OK for MacOS but doesn't seem to work on debian using bash (tested with bash 4.4, grep 2.27).
I could get a match using the -P directive (for Perl regex)
user#host:~ $ printf '\x02\n3\n\x02' | grep -c -P '\x02'
2
user#host:~ $ printf '\x02\n3\n\x02' | grep -c -P '\xFF' #same input, different pattern
0
user#host:~ $ printf '\x02\n3\n\xff' | grep -c -P '\xFF' #match with unmatching case
2
Hope this helps

This seems to do what you want on macOS:
printf "\x02\n3\n\x02" | grep -c "\x02"
2

Related

How can pipes and grep and wc be combined to just give a count of the phrase “syntax ok”

How can pipes and grep and wc be combined to just give a count of the phrase “syntax ok”
Something like the following…
cd /usr/IBMIHS/bin/ |
apachectl -t -f /usr/IBMIHS/conf/AAA/httpd.conf |
apachectl -t -f /usr/IBMIHS/conf/AAA/siteAA.conf |
grep "^Syntax OK" | wc
Simply via grouping commands with curly brackets, and use grep -c:
{
apachectl -t -f /usr/IBMIHS/conf/AAA/httpd.conf
apachectl -t -f /usr/IBMIHS/conf/AAA/siteAA.conf
} |& grep -c "Syntax OK"
From man grep
-c, --count
Suppress normal output; instead print a count of matching lines for each input file. With the -v, --invert-match option (see below), count non-matching lines.

dynamically pass string to Rscript argument with sed

I wrote a script in R that has several arguments. I want to iterate over 20 directories and execute my script on each while passing in a substring from the file path as my -n argument using sed. I ran the following:
find . -name 'xray_data' -exec sh -c 'Rscript /Users/Caitlin/Desktop/DeMMO_Pubs/DeMMO_NativeRock/DeMMO_NativeRock/R/scipts/dataStitchR.R -f {} -b "{}/SEM_images" -c "{}/../coordinates.txt" -z ".tif" -m ".tif" -a "Unknown|SEM|Os" -d "overview" -y "overview" --overview "overview.*tif" -p FALSE -n "`sed -e 's/.*DeMMO.*[/]\(.*\)_.*[/]xray_data/\1/' "{}"`"' sh {} \;
which results in this error:
ubs/DeMMO_NativeRock/DeMMO_NativeRock/R/scipts/dataStitchR.R -f {} -b "{}/SEM_images" -c "{}/../coordinates.txt" -z ".tif" -m ".tif" -a "Unknown|SEM|Os" -d "overview" -y "overview" --overview "overview.*tif" -p FALSE -n "`sed -e 's/.*DeMMO.*[/]\(.*\)_.*[/]xray_data/\1/' "{}"`"' sh {} \;
sh: command substitution: line 0: syntax error near unexpected token `('
sh: command substitution: line 0: `sed -e s/.*DeMMO.*[/](.*)_.*[/]xray_data/1/ "./DeMMO1/D1T3rep_Dec2019_Ellison/xray_data"'
When I try to use sed with my pattern on an example file path, it works:
echo "./DeMMO1/D1T1exp_Dec2019_Poorman/xray_data" | sed -e 's/.*DeMMO.*[/]\(.*\)_.*[/]xray_data/\1/'
which produces the correct substring:
D1T1exp_Dec2019
I think there's an issue with trying to use single quotes inside the interpreted string but I don't know how to deal with this. I have tried replacing the single quotes around the sed pattern with double quotes as well as removing the single quotes, both result in this error:
sed: RE error: illegal byte sequence
How should I extract the substring from the file path dynamically in this case?
To loop through the output of find.
while IFS= read -ru "$fd" -d '' files; do
echo "$files" ##: do whatever you want to do with the files here.
done {fd}< <(find . -type f -name 'xray_data' -print0)
No embedded commands in quotes.
It uses a random fd just in case something inside the loop is eating/slurping stdin
Also -print0 delimits the files with null bytes, so it should be safe enough to handle spaces tabs and newlines on the path and file names.
A good start is always put an echo in front of every commands you want to do with the files, so you have an idea what's going to be executed/happen just in case...
This is the solution that ultimately worked for me due to issues with quotes in sed:
for dir in `find . -name 'xray_data'`;
do sampleID="`basename $(dirname $dir) | cut -f1 -d'_'`";
Rscript /Users/Caitlin/Desktop/DeMMO_Pubs/DeMMO_NativeRock/DeMMO_NativeRock/R/scipts/dataStitchR.R -f "$dir" -b "$dir/SEM_images" -c "$dir/../coordinates.txt" -z ".tif" -m ".tif" -a "Unknown|SEM|Os" -d "overview" -y "overview" --overview "overview.*tif" -p FALSE -n "$sampleID";
done

UNIX for loop with some options

I have for loop:
for mnt `cat $file.txt`
do
grep -h -i -A 3 -B 4 *log | grep -v "10001" >> extrafile.txt
done
What does -A 3 and -B 4 means?
After and Before followed by number of lines
After en Before. After and before what?
No wonder the grep is confusing: You don't mention the "${mnt}" you are searching for. When I improve your script (moving input and output to the end, outside the loop, and using ${mnt}), the script looks like
while read -r mnt; do
grep -h -i -A 3 -B 4 "${mnt}" *log | grep -v "10001"
done < "${file.txt}" >> extrafile.txt
You get the context of every hit from $file.txt and delete all lines with 10001.

xargs to copy one file into several

I have a directory that has one file with information (call it masterfile.inc) and several files that are empty (call them file1.inc-file20.inc)
I'm trying to formulate an xargs command that copies the contents of masterfile.inc into all of the empty files.
So far I have
ls -ltr | awk '{print $9}' | grep -v masterfile | xargs -I {} cat masterfile.inc > {}
Unfortunately, all this does is creates a file called {} and prints masterfile.inc into it N times.
Is there something I'm missing with the syntax here?
Thanks in advance
You can use this command to copy file 20 times:
$ tee <masterfile.inc >/dev/null file{1..20}.inc
Note: file{1..20}.inc will expand to file1, file2, ... , file20
If you disternation filenames are random:
$ shopt -s extglob
$ tee <masterfile.inc >/dev/null $(ls !(masterfile.inc))
Note: $(ls !(masterfile.inc)) will expand to all file in current directory except masterfile.inc (please don't use spaces in filename)
While the tee trick is really brilliant you might be interested in a solution that is easier to adapt for other situations. Here using GNU Parallel:
ls -ltr | awk '{print $9}' | grep -v masterfile | parallel "cat masterfile.inc > {}"
It takes literally 10 seconds to install GNU Parallel:
wget pi.dk/3 -qO - | sh -x
Watch the intro videos to learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Multiple grep search/ignore patterns

I usually use the following pipeline to grep for a particular search string and yet ignore certain other patterns:
grep -Ri 64 src/install/ | grep -v \.svn | grep -v "file"| grep -v "2\.5" | grep -v "2\.6"
Can this be achieved in a succinct manner? I am using GNU grep 2.5.3.
Just pipe your unfiltered output into a single instance of grep and use an extended regexp to declare what you want to ignore:
grep -Ri 64 src/install/ | grep -v -E '(\.svn|file|2\.5|2\.6)'
Edit: To search multiple files maybe try
find ./src/install -type f -print |\
grep -v -E '(\.svn|file|2\.5|2\.6)' | xargs grep -i 64
Edit: Ooh. I forgot to add the simple trick to stop a cringeable use of multiple grep instances, namely
ps -ef | grep something | grep -v grep
Replacing that with
ps -ef | grep "[s]omething"
removes the need of the second grep.
Use the -e option to specify multiple patterns:
grep -Ri 64 src/install/ | grep -v -e '\.svn' -e file -e '2\.5' -e '2\.6'
You might also be interested in the -F flag, which indicates that patterns are fixed strings instead of regular expressions. Now you don't have to escape the dot:
grep -Ri 64 src/install/ | grep -vF -e .svn -e file -e 2.5 -e 2.6
I noticed you were grepping out ".svn". You probably want to skip any directories named ".svn" in your initial recursive grep. If I were you, I would do this instead:
grep -Ri 64 src/install/ --exclude-dir .svn | grep -vF -e file -e 2.5 -e 2.6
you can use awk instead of grep
awk '/64/&&!/(\.svn|file|2\.[56])/' file
You maybe want to use ack-grep which allow to exclude with perl regexp as well and avoid all the VC directories, great for grepping source code.
The following script will remove all files except a list of files:
echo cleanup_all $#
if [[ $# -eq 0 ]]; then
FILES=`find . -type f`
else
EXCLUDE_FILES_EXP="("
for EXCLUDED_FILE in $#
do
EXCLUDE_FILES_EXP="$EXCLUDE_FILES_EXP./$EXCLUDED_FILE|"
done
# strip last char
EXCLUDE_FILES_EXP="${EXCLUDE_FILES_EXP%?}"
EXCLUDE_FILES_EXP="$EXCLUDE_FILES_EXP)"
echo exluded files expression : $EXCLUDE_FILES_EXP
FILES=`find . -type f | egrep -v $EXCLUDE_FILES_EXP`
fi
echo removing $FILES
for FILE in $FILES
do
echo "cleanup: removing file $FILE"
rm $FILE
done

Resources