Unix Recursively move all files but keeping the structure - unix

I have a folder named "in" that contains several folders "a" "b" "c" and I want to move all files to thhe folder "proc" and compress them. The tricky part is the files in "in/a" have to be moved to "proc/a", "in/b" have to be moved to "proc/b" and so on
I managed to find all files and zip them whit this command
find . -type f ! \( -name "*gz" -o -name "*tmp" -o -name "*xftp" \) -exec gzip -n '{}' \;
But I'm not finding a generic command to move the files that works whiteout me telling the name of the folders. Can anyone give me a hand?

Well I ended up finding out I had a couple more problems for example the target folder not existing so I ended up using this code
find . -type f ! \( -name "*gz" -o -name "*tmp" -o -name "*xftp" \) -exec gzip -n '{}' \;
find . -name "*.gz" | cpio -p -dumv $1
if [ "$?" = "0" ]; then
find . -name "*.gz" -exec rm -rf {} \;
else
echo "cpio Failed!" 1>&2
exit 1
fi
the 1st line finds all files to be processed and zips them.
the second line finds all files and copies to the target dir, in my case it was $1 (argument 1), creating as many folders as necessary to ensure the same structure.
The third line checks the status of the last command if it worked it finds and removes all gz files from the source folder whiteout deleting any folder. If it didn't deletes nothing so I can analyse what happened (maybe run out of space)
I bet there's a faster way of doing this whiteout having to use so much disk space but since that was not a problem for me it looks acceptable.

Related

Unix 'find' without descending into matched directories

I was trying to remove all git files from a repository with this:
find . -name ".git*" -exec rm -rf {} \;
However, rm warns that files could not be deleted (because their parent directory has already been deleted).
Is there a way to get find to stop recursing when it finds a matching directory?
E.g. find...
/.gitmodules
/.git/stuff
/.git/.gitfile
... produces
/.gitmodules
/.git
Use -depth:
find . -depth -name ".git*" -exec rm -rf {} \;
This would allow you to process the files or subdirectories first before their parent directories.

Unix to find pdf files from list in text file

I have a directory (for Endnote) that is filled with PDF files (1000's of them). I have used Unix to print a list of all of the pdf files and saved this list as a text file. Most of these pdf files are located in other directories throughout my computer (duplicates).
Now, I want to use the find command to search for duplicates of these pdf files throughout the rest of my computer and if a duplicate is found, move it to a new directory. If a specific file name is found more than once, I want to give each a unique name (ie basename.pdf.1, basename.pdf.2 etc). At the end, I want a single directory for all duplicates so I can double check them and then delete).
However, I do not want find to search the directory in which my list was made from or my Dropbox, as I do not want to move these pdf files (only move the other pdfs scattered throughout my computer).
I have found (I think) how to do all of the individual steps that I need to complete this task, but I cannot seem to put everything together into a working Unix command.
1) In order to find files while excluding a directory:
find -name "what to search for" -not -path "excluded_directory"
or
find build -not \( -path excluded_directory1 -prune \) -not \( -path excluded_directory2 -prune \) -name \*.what_to_find
or my current favorite
find . -name '*.what_to_find' | grep -v exludeddir1 | grep -v excludeddir2
2) In order to read a text file into find and use the lines as search patterns:
find . type f -print | fgrep -f file_list.txt
3) to find and move files
find / -iname "*.what_to_find" -type f -exec mv {} /new_directory \;
or
find / -iname "*.what_to_find" -type f | xargs -I '{}' /new_directory
or (to rename files so files with same name are not just overwritten by each other). I haven't quite figured everything going on in this command out yet...
find -name '*.what_to_find' -type f -exec bash -c 'mv -v "$0" "./$( mktemp "$( basename "$0" ).XXX" )"' '{}' \;
So, I can execute this commands individually, but have not been able to get them to work together as desired (maybe my order of commands is wrong? other problems?).
find . type f -print | fgrep -f file_list.txt | grep -v excludeddir1 | grep -v excludeddir2 -exec bash -c 'echo mv -v "$0" "./$( mktemp "$( basename "$0" ).XXX" )"' '{}' \;
Any help is much appreciated!
Thanks,
Derrick
Well I wasn't able to complete this task exactly how I wanted to, but I found a work around that got the job done.
I printed a list of all PDFs I have in Endnote, then deleted the path name, leaving just the file names (find and replace function in text wrangler). I then used the find command to search this list against my computer, printing all occurances of each PDF.
Then in text wrangler, I deleted all lines containing the initial path to my endnote PDFs, leaving just the desired duplicates.
Next, I used the find command to search for these exact paths and move them to a new folder.
All In all, I got by with the exact same commands I have in my original post, and a little help from text wrangler. Unfortunately I never figured out how to combine all my desired steps into a single unix command.

How to limit grep to only search the files that you want

We have a rather large and complex file system and I am trying to generate a list of files containing a particular text string. This should be simple, but I need to exclude the './svn' and './pdv' directories (and probably others) and to only look at files of type *.p, *.w or .i.
I can easily do this with a program, but it is proving very slow to run. I want to speed up the process (so that I'm not searching thousands of files repeatedly) as I need to run such searches against a long list of criteria.
Normally, we search the file system using:
find . -name "*.[!r]*" -exec grep -i -l "search for me" {} \;
This is working, but I'm then having to use a program to exclude the unwanted directories , so it is running very slowly.
After looking at the topics here:
Stack Overflow thread
I've decided to try a few other aproaches:
grep -ilR "search for me" . --exclude ".svn" --excluse "pdv" --exclude "!.{p,w,i*}"
Excludes the './svn', but not the './pdv' directories, Doesn't limit the files looked at.
grep -ilR "search for me" . --exclude ".svn" --excluse "pdv" --include "*.p"
Excludes the './svn', but not the './pdv' directories, Doesn't limit the files looked at.
find . -name "*.[!r]*" -exec grep -i -l ".svn" | grep -i -l "search for me" {} \;
I can't even get this (or variations on it) to run successfully.
find . ! -name "*.svn*" -prune -print -exec grep -i -l "search for me" {} \;
Doesn't return anything. It looks like it stops as soon as it finds the .svn directory.
How about something like:
find . \( \( -name .svn -o -name pdv \) -type d -prune \) -o \( -name '*.[pwi]' -type f -exec grep -i -l "search for me" {} + \)
This will:
- ignore the contents of directories named .svn and pdv
- grep files (and symlinks to files) named *.[pwi]
The + option after exec means gather as many files into a single command as will fit on the command line (roughly 1 million chars in Linux). This can seriously speed up processing if you have to iterate over thousands of files.
Following command finds only *.rb files containing require 'bundler/setup' line and excludes search in .git and .bundle directories. That is the same use case I think.
grep -ril --exclude-dir .git --exclude-dir .bundle \
--include \*.rb "^require 'bundler/setup'$" .
The problem was with swapping of --exclude and --exclude-dir parameters I believe. Refer to the grep(1) manual.
Also note that exclude/include parameters accept GLOB only, not regexps, therefore single character suffix range can be done with one --include parameter, but more complex conditions would require more of the parameters:
--include \*.[pwi] --include \*.multichar_sfx ...
You can try the following:
find path_starting_point -type f | grep regex_to_filter_file_names | xargs grep regex_to_find_inside_matched_files
find . -name "filename_regex"|grep -v '.svn' -v '.pdv'|xargs grep -i 'your search string'

Why doesn't my 'find' work like I expect using -exec?

I'm trying to remove all the .svn directories from a working directory.
I thought I would just use find and rm like this:
find . -iname .svn -exec 'rm -rf {}' \;
But the result is:
find: rm -rf ./src/.svn: No such file or directory
Obviously the file exists, or find wouldn't find it... What am I missing?
You shouldn't put the rm -rf {} in single quotes.
As you've quoted it the shell is treating all of the arguments to -exec it as a command rather than a command plus arguments, so it's looking for a file called "rm -rf ./src/.svn" and not finding it.
Try:
find . -iname .svn -exec rm -rf {} \;
Just by-the-bye, you should probably get out of the habit of using -exec for things that can be done to multiple files at once. For instance, I would write that out of habit as
find . -iname .svn -print | xargs rm -rf
or, since I'm now using a Macintosh and more likely to encounter file or directory names with spaces in them
find . -iname .svn -print0 | xargs -0 rm -rf
"xargs" makes sure that "rm" is invoked only once every "N" files, where "N" is usually 20. That's not a huge win in this case, because rm is small, but if the program you wanted to execute on every file was large or did a lot of processing on start up, it could make things a lot faster.
maybe its just me, but the old find & rm script does not work on my current config, a la:
find /data/bin/test -type d -mtime +10 -name "[0-9]*" -exec rm -rf {} \;
whereas the xargs solution does, a la:
find /data/bin/test -type d -mtime +10 -name '[0-9]*' -print | xargs rm -rf ;
no idea why, but i've updated my scriptLib so i dont spend another couple hours beating
my head on something so simple....
(running RHEL under kernel-2.6.18-194.11.3.el5)
EDIT: found out why - my RHEL distro defaults vi to insert the dreaded cr into line breaks (whch breaks the command) - following suggestions from nx5000 & jliagre at linuxquestions.org, added the following to ~/.vimrc:
:set fileformat=unix
map <F4> :set fileformat=unix<CR>
map <F5> :set fileformat=dos<CR>
which allows the behavior to pivot on the F4/F5.
to check whether CR's are embedded in your file:
head -1 scriptFile.sh | od -c | head -1
http://www.linuxquestions.org/questions/linux-general-1/bad-interpreter-no-such-file-or-directory-213617/
You can also use the svn command as follows:
svn export <url-to-repo> <dest-path>
Look here for more info.
Try
find . -iname .svn -exec rm -rf {} \;
and that probably ought to work IIRC.
You can pass anything you want in quotes, with the following trick.
find . -iname .svn -exec bash -c 'rm -rf {}' \;
The exec option will be happy to see that you're simply calling an executable with an argument, but your argument will be able to contain a script of basically any size and shape.
find . -iname .svn -exec bash -c '
ls -l "{}" | wc -l
' \;

How do you recursively unzip archives in a directory and its subdirectories from the Unix command-line?

The unzip command doesn't have an option for recursively unzipping archives.
If I have the following directory structure and archives:
/Mother/Loving.zip
/Scurvy/Sea Dogs.zip
/Scurvy/Cures/Limes.zip
And I want to unzip all of the archives into directories with the same name as each archive:
/Mother/Loving/1.txt
/Mother/Loving.zip
/Scurvy/Sea Dogs/2.txt
/Scurvy/Sea Dogs.zip
/Scurvy/Cures/Limes/3.txt
/Scurvy/Cures/Limes.zip
What command or commands would I issue?
It's important that this doesn't choke on filenames that have spaces in them.
If you want to extract the files to the respective folder you can try this
find . -name "*.zip" | while read filename; do unzip -o -d "`dirname "$filename"`" "$filename"; done;
A multi-processed version for systems that can handle high I/O:
find . -name "*.zip" | xargs -P 5 -I fileName sh -c 'unzip -o -d "$(dirname "fileName")/$(basename -s .zip "fileName")" "fileName"'
A solution that correctly handles all file names (including newlines) and extracts into a directory that is at the same location as the file, just with the extension removed:
find . -iname '*.zip' -exec sh -c 'unzip -o -d "${0%.*}" "$0"' '{}' ';'
Note that you can easily make it handle more file types (such as .jar) by adding them using -o, e.g.:
find . '(' -iname '*.zip' -o -iname '*.jar' ')' -exec ...
Here's one solution that extracts all zip files to the working directory and involves the find command and a while loop:
find . -name "*.zip" | while read filename; do unzip -o -d "`basename -s .zip "$filename"`" "$filename"; done;
You could use find along with the -exec flag in a single command line to do the job
find . -name "*.zip" -exec unzip {} \;
This works perfectly as we want:
Unzip files:
find . -name "*.zip" | xargs -P 5 -I FILENAME sh -c 'unzip -o -d "$(dirname "FILENAME")" "FILENAME"'
Above command does not create duplicate directories.
Remove all zip files:
find . -depth -name '*.zip' -exec rm {} \;
Something like gunzip using the -r flag?....
Travel the directory structure recursively. If any of the file names specified on the command line are directories, gzip will descend into the directory and compress all the files it finds there (or decompress them in the case of gunzip ).
http://www.computerhope.com/unix/gzip.htm
If you're using cygwin, the syntax is slightly different for the basename command.
find . -name "*.zip" | while read filename; do unzip -o -d "`basename "$filename" .zip`" "$filename"; done;
I realise this is very old, but it was among the first hits on Google when I was looking for a solution to something similar, so I'll post what I did here. My scenario is slightly different as I basically just wanted to fully explode a jar, along with all jars contained within it, so I wrote the following bash functions:
function explode {
local target="$1"
echo "Exploding $target."
if [ -f "$target" ] ; then
explodeFile "$target"
elif [ -d "$target" ] ; then
while [ "$(find "$target" -type f -regextype posix-egrep -iregex ".*\.(zip|jar|ear|war|sar)")" != "" ] ; do
find "$target" -type f -regextype posix-egrep -iregex ".*\.(zip|jar|ear|war|sar)" -exec bash -c 'source "<file-where-this-function-is-stored>" ; explode "{}"' \;
done
else
echo "Could not find $target."
fi
}
function explodeFile {
local target="$1"
echo "Exploding file $target."
mv "$target" "$target.tmp"
unzip -q "$target.tmp" -d "$target"
rm "$target.tmp"
}
Note the <file-where-this-function-is-stored> which is needed if you're storing this in a file that is not read for a non-interactive shell as I happened to be. If you're storing the functions in a file loaded on non-interactive shells (e.g., .bashrc I believe) you can drop the whole source statement. Hopefully this will help someone.
A little warning - explodeFile also deletes the ziped file, you can of course change that by commenting out the last line.
Another interesting solution would be:
DESTINY=[Give the output that you intend]
# Don't forget to change from .ZIP to .zip.
# In my case the files were in .ZIP.
# The echo were for debug purpose.
find . -name "*.ZIP" | while read filename; do
ADDRESS=$filename
#echo "Address: $ADDRESS"
BASENAME=`basename $filename .ZIP`
#echo "Basename: $BASENAME"
unzip -d "$DESTINY$BASENAME" "$ADDRESS";
done;
You can also loop through each zip file creating each folder and unzip the zip file.
for zipfile in *.zip; do
mkdir "${zipfile%.*}"
unzip "$zipfile" -d "${zipfile%.*}"
done
this works for me
def unzip(zip_file, path_to_extract):
"""
Decompress zip archives recursively
Args:
zip_file: name of zip archive
path_to_extract: folder where the files will be extracted
"""
try:
if is_zipfile(zip_file):
parent_file = ZipFile(zip_file)
parent_file.extractall(path_to_extract)
for file_inside in parent_file.namelist():
if is_zipfile(os.path.join(os.getcwd(),file_inside)):
unzip(file_inside,path_to_extract)
os.remove(f"{zip_file}")
except Exception as e:
print(e)

Resources