bzgrep not printing the file name - unix

find . -name '{fileNamePattern}*.bz2' | xargs -n 1 -P 3 bzgrep -H "{patternToSearch}"
I am using the command above to find out a .bz2 file from set of files that have a pattern that I am looking for. It does go through the files because I can see the pattern that I am trying to find being printed on the console but I don't see the file name.

If you look at the bzgrep script (for example this version for OS X) you will see that it pipes the output from bzip2 through grep. That process loses the original filenames. grep never sees them so it cannot print them out (despite your -H flag).
Something like this should do, not exactly what you want but something similar. (You could get the prefix you were expecting by piping the output from bzgrep into sed/awk but that's a bit less simple of a command to write out.)
find . -name '{fileNamePattern}*.bz2' -printf '### %p\n' -exec bzgrep "{patternToSearch}" {} \;

I printed the file name through echo command and xargs.
find . -name "*bz2" | parallel -j 128 echo -n {}\" \" | xargs bzgrep {pattern}

Etan is very close with his answer: grep indeed does not show the filename when only dealing with one file, so you can make grep believe he's looking into multiple files, just by adding the NULL file, so the command becomes:
find . -name '{fileNamePattern}*.bz2' -printf '### %p\n'
-exec bzgrep "{patternToSearch}" {} /dev/null \;
(It's a dirty trick but it's helping me already for more than 15 years :-) )

Related

Get only file name from find command and mail

The following find command will results multiple files and send mail all those
find /home/cde -ctime -1 -name "Sum*pdf*" -exec uuencode {} {} \; |mailx -s "subject" abc#gmail.com
but I am getting attachments like "homecdeSum123.pdf" and "homecdeSum324.pdf". How to get exact file names in my attachment. Please help me on this
Trying to answer what seems to be at least part of your question
but I am getting attachments like "homecdeSum123.pdf" and "homecdeSum324.pdf". How to get exact file names in my attachment.
The accepted answer to this question:
find: What's up with basename and dirname?
contains a lot of useful information in my opinion, but I am trying to extract what could be the answer to the above:
What you describe is that when you are doing
find /home/cde ....
you are getting file names like "homecdeSum123.pdf" so my take is that you only want the basename of the file, not also the directory.
This can be accessed like this (as an example only listing)
find `pwd` -name "*.png" -exec echo $(basename {} ) \;
A slight variation of this is to use -execdir instead of -exec which is (taking from the man page):
-execdir command {} +
Like -exec, but the specified command is run from the subdirectory containing the matched file, which is not normally the directory in which you started find.
Does this help?
all attachments in single mail:
find /home/cde -ctime -1 -name "Sum*pdf*" | while read name; do uuencode "$name" "${name##*/}"; done | mailx -s "subject" abc#example.com
to get separate mail:
find /home/cde -ctime -1 -name "Sum*pdf*" | while read name; do uuencode "$name" "${name##*/}"| mailx -s "subject" abc#example.com ; done

Unix to find pdf files from list in text file

I have a directory (for Endnote) that is filled with PDF files (1000's of them). I have used Unix to print a list of all of the pdf files and saved this list as a text file. Most of these pdf files are located in other directories throughout my computer (duplicates).
Now, I want to use the find command to search for duplicates of these pdf files throughout the rest of my computer and if a duplicate is found, move it to a new directory. If a specific file name is found more than once, I want to give each a unique name (ie basename.pdf.1, basename.pdf.2 etc). At the end, I want a single directory for all duplicates so I can double check them and then delete).
However, I do not want find to search the directory in which my list was made from or my Dropbox, as I do not want to move these pdf files (only move the other pdfs scattered throughout my computer).
I have found (I think) how to do all of the individual steps that I need to complete this task, but I cannot seem to put everything together into a working Unix command.
1) In order to find files while excluding a directory:
find -name "what to search for" -not -path "excluded_directory"
or
find build -not \( -path excluded_directory1 -prune \) -not \( -path excluded_directory2 -prune \) -name \*.what_to_find
or my current favorite
find . -name '*.what_to_find' | grep -v exludeddir1 | grep -v excludeddir2
2) In order to read a text file into find and use the lines as search patterns:
find . type f -print | fgrep -f file_list.txt
3) to find and move files
find / -iname "*.what_to_find" -type f -exec mv {} /new_directory \;
or
find / -iname "*.what_to_find" -type f | xargs -I '{}' /new_directory
or (to rename files so files with same name are not just overwritten by each other). I haven't quite figured everything going on in this command out yet...
find -name '*.what_to_find' -type f -exec bash -c 'mv -v "$0" "./$( mktemp "$( basename "$0" ).XXX" )"' '{}' \;
So, I can execute this commands individually, but have not been able to get them to work together as desired (maybe my order of commands is wrong? other problems?).
find . type f -print | fgrep -f file_list.txt | grep -v excludeddir1 | grep -v excludeddir2 -exec bash -c 'echo mv -v "$0" "./$( mktemp "$( basename "$0" ).XXX" )"' '{}' \;
Any help is much appreciated!
Thanks,
Derrick
Well I wasn't able to complete this task exactly how I wanted to, but I found a work around that got the job done.
I printed a list of all PDFs I have in Endnote, then deleted the path name, leaving just the file names (find and replace function in text wrangler). I then used the find command to search this list against my computer, printing all occurances of each PDF.
Then in text wrangler, I deleted all lines containing the initial path to my endnote PDFs, leaving just the desired duplicates.
Next, I used the find command to search for these exact paths and move them to a new folder.
All In all, I got by with the exact same commands I have in my original post, and a little help from text wrangler. Unfortunately I never figured out how to combine all my desired steps into a single unix command.

UNIX get info about file in all directories matching a pattern

I have a bunch of directories that all contain a file /SubDir1/SubDir2/File, and I want to see the memory of each file under directories matching a certain pattern. How do I do this?
So far I have ls -l | grep "pattern* to get a list of the directories, but am stuck at this.
You should use the find command:
find . -name 'pattern*' -printf '%s\t%p\n'
By "memory of each file" I guess you mean file size.
The find command will do a better job:
find . -name "pattern*" -exec du -b {} \;
This will print the file size of every file named File in your arborescence along with the file path.
Bash Pitfall #1: Don't parse ls
You can use find or shell patterns:
for i in pattern*; do
cat "$i"
done
One of your special problems is to get a list of all files under a set of matching directories, and you can do that with a more elaborate pattern:
for i in pattern*/*; do
if [ -f "$i" ]; then
cat "$i"
fi
done
In addition to what SirDarius said, you can also use the -R option to ls to get a recursive listing.
Something like ls -lRh | grep "pattern" should do what you want.

How to limit grep to only search the files that you want

We have a rather large and complex file system and I am trying to generate a list of files containing a particular text string. This should be simple, but I need to exclude the './svn' and './pdv' directories (and probably others) and to only look at files of type *.p, *.w or .i.
I can easily do this with a program, but it is proving very slow to run. I want to speed up the process (so that I'm not searching thousands of files repeatedly) as I need to run such searches against a long list of criteria.
Normally, we search the file system using:
find . -name "*.[!r]*" -exec grep -i -l "search for me" {} \;
This is working, but I'm then having to use a program to exclude the unwanted directories , so it is running very slowly.
After looking at the topics here:
Stack Overflow thread
I've decided to try a few other aproaches:
grep -ilR "search for me" . --exclude ".svn" --excluse "pdv" --exclude "!.{p,w,i*}"
Excludes the './svn', but not the './pdv' directories, Doesn't limit the files looked at.
grep -ilR "search for me" . --exclude ".svn" --excluse "pdv" --include "*.p"
Excludes the './svn', but not the './pdv' directories, Doesn't limit the files looked at.
find . -name "*.[!r]*" -exec grep -i -l ".svn" | grep -i -l "search for me" {} \;
I can't even get this (or variations on it) to run successfully.
find . ! -name "*.svn*" -prune -print -exec grep -i -l "search for me" {} \;
Doesn't return anything. It looks like it stops as soon as it finds the .svn directory.
How about something like:
find . \( \( -name .svn -o -name pdv \) -type d -prune \) -o \( -name '*.[pwi]' -type f -exec grep -i -l "search for me" {} + \)
This will:
- ignore the contents of directories named .svn and pdv
- grep files (and symlinks to files) named *.[pwi]
The + option after exec means gather as many files into a single command as will fit on the command line (roughly 1 million chars in Linux). This can seriously speed up processing if you have to iterate over thousands of files.
Following command finds only *.rb files containing require 'bundler/setup' line and excludes search in .git and .bundle directories. That is the same use case I think.
grep -ril --exclude-dir .git --exclude-dir .bundle \
--include \*.rb "^require 'bundler/setup'$" .
The problem was with swapping of --exclude and --exclude-dir parameters I believe. Refer to the grep(1) manual.
Also note that exclude/include parameters accept GLOB only, not regexps, therefore single character suffix range can be done with one --include parameter, but more complex conditions would require more of the parameters:
--include \*.[pwi] --include \*.multichar_sfx ...
You can try the following:
find path_starting_point -type f | grep regex_to_filter_file_names | xargs grep regex_to_find_inside_matched_files
find . -name "filename_regex"|grep -v '.svn' -v '.pdv'|xargs grep -i 'your search string'

UNIX: rename files piped from find command

I basically want to add a string to all the files in a directory that are locked. I'm having trouble passing the filenames to a mv command:
find . -flags uchg -exec chflags nouchg "{}" | mv "{}" "{}"_LOCK \;
The above code obviously doesnt work but I think it explains what I'm trying to do.
I'm facing two problems:
Adding a string to the end of a filename but before the extension (001_LOCK.jpg).
Passing the output of the find command twice. I need to do this because it won't let me change the names of the files while they are locked. So I need to unlock the file and then rename it.
Does anyone have any ideas?
This should be a good start.
I assume you do not pipe chflags to mv, which doesn't make sense, but just rename the file if chflags fails. Processing the extension is more tricky but is certainly doable.
find . -flags uchg -exec sh -c "chflags nouchg \$0 || mv \$0 \$0_LOCK" {} \;
Edit: rename if chflags succeeds:
find . -flags uchg -exec sh -c "chflags nouchg \$0 && mv \$0 \$0_LOCK" {} \;

Resources