Is there a method to enter every subdirectory in a directory and perform analysis on file with certain extension? - unix

I have one directory, with multiple subdirectories. In each subdirectory there is a file on which I want to perform analysis (code already written).
Common for all subdirectories is that they have file with same extension on which analysis should be performed.
Using Unix shell, is there a way to write a commands which will:
for each subdirectory in main directory, use file with certain extension and perform further commands on that file (further commands include creation of some new directories and files)
repeat it for all subdirectories in main directory and files inside them
I will appreciate all suggestions.

Use the find command. find . -type f -name '*.txt' -exec prog \{} \; will execute program prog with the name of every file in the current directory . and below with the extension .txt (i.e. that matches the pattern *.txt). The -type f excludes directories (and pipes and devices). The -exec means execute this command; the \{} will be replaced with the filename; \; means end of command.
This definitely works if your filenames have no spaces, quote marks, or backslashes in them. If they do, it gets a little trickier: find . -type f -name '*.txt' -print0 | xargs -0 -n1 prog, assuming the filename argument goes at the end of the line. The -print0 means output the file with null termination (zero character) and the -0 means input with null termination. xargs takes its input and invokes prog for every null-terminated word. -n1 means only use one argument per invocation; you can omit it if the program accepts multiple filenames as arguments. You can use -i if you need to insert text after the argument.
Note: I am aware that using -exec for various obscure reasons may not be preferable for, say, secure system shell scripts, but for a use case like this it is fine.

Related

Unix : how to tar only N first files of each folder?

I have a folder containing 2Gb of images, with sub-folders several levels deep.
I'd like to archive only N files of each (sub) folder in a tar file. I tried to use find then tail then tar but couldn't manage to get it to work. Here is what I tried (assuming N = 10):
find . | tail -n 10 | tar -czvf backup.tar.gz
… which outputs this error:
Cannot stat: File name too long
What's wrong here? thinking of it - even if it works I think it will tar only the first 10 files of all folders, not the first 10 files of each folder.
How can I get the first N files of each folder?
A proposal with some quirks: order is only determined by the order out of find, so "first" isn't well-defined.
find . -type f |
awk -v N=10 -F / 'match($0, /.*\//, m) && a[m[0]]++ < N' |
xargs -r -d '\n' tar -rvf /tmp/backup.tar
gzip /tmp/backup.tar
Comments:
use find . -type f to ensure that files have a leading directory-name prefix, so the next step can work
the awk command tracks such leading directory names, and emits full path names until N (10, here) files with the same leading directory have been emitted
use xargs to invoke tar - we're gathering regular file names, and they need to be arguments to that archiving command
xargs may invoke tar more than once, so we'll append (-r option) to a plain archive, then compress it after it's all written
Also, you may not want to write a backup file into the current directory, since you're scanning that - that's why this suggestion writes into /tmp.

Find and tar files on Solaris

I've got a little problem with my bash script. I'm newbie in unix world, so I find it difficult to deal with an exercise. What I have to do is find files on Solaris server with specific name, modified in specific time and archive them in one .tar file. First two points are easy, but I'm having a nightmare with trying to archive it. The thing is, I constantly archive whole tree of file (with file at the end) to .tar file, but I need just a file. My code looks like this:
find ~ -name "$maska" -mtime -$dni | xargs -t -L 1 tar -cvf $3 -C
where $maska is the name of the file, $dni refers to modification time and $3 is just a archive name. I found out about -C switch, that let's me jump into the folder where desired file is, but when I use it with xargs, it seems just to jump there and do nothing else.
So my question is:
1) is there any possibility of achieving my goal this way?
Please remember, I don't work on gnu tar. And I HAVE TO use commands: tar, find.
Edit: I'd like to specify more my problem. When I use the script for, for example, file a, it should look for it since the point shown in script (it's ~ ) and everything it will find should be in one tar file.
What I got right now is (I'm in /home/me/Scripts):
-bash-3.2$ ./Script.sh a 1000 backup
a /home/me/Program/Test/a/ 0K
a /home/me/Program/Test/a/a.c 1K
a /home/me/Program/Test/a/a.out 8K
So script has done some packing. Next I want to see my packed file, so:
-bash-3.2$ tar -tf backup
/home/me/Program/Test/a/
/home/me/Program/Test/a/a.c
/home/me/Program/Test/a/a.out
And that's the problem. Tar file have all the paths in it, so if I will untar it, instead of getting just the file I wanted to archive, I will replace them in their old places. For visualisation:
-bash-3.2$ ls
Script.sh* Script.sh~* backup
-bash-3.2$ tar -xvf backup
x /home/me/Program/Test/a, 0 bytes, 0 tape blocks
x /home/me/Program/Test/a/a.c, 39 bytes, 1 tape blocks
x /home/me/Program/Test/a/a.out, 7928 bytes, 16 tape blocks
-bash-3.2$ ls
Script.sh* Script.sh~* backup
That's the problem.
So all I want is to pack all those desired file (a in example above) in one tar file without those paths, so it will simply untar in the directory I run the Script.sh.
I'm not sure to understand what you want but this might be it :
find ~ -name "$maska" -mtime -$dni -exec tar cvf $3 {} +
Edit: second attempt after your wrote the main issue is the absolute path:
( cd ~; find . -name "$maska" -type f -mtime -$dni -exec tar cvf $3 {} + )
Edit: third attempt, after you wrote you want no path at all in the archive, maska is a directory name and $3 need to be in the current directory:
mkdir ~/foo && \
find ~ -name "$maska" -type d -mtime -$dni -exec sh -c 'ln -s $1/* ~/foo/' sh {} \; && \
( cd ~/foo ; tar chf - * ) > $3 && \
rm -rf ~/foo
Replace ~/foo by ~/somethingElse if ~/foo already exists for some reason.
Maybe you can do something like this:
#!/bin/bash
find ~ -name "$maska" -mtime -$dni -print0 | while read -d $'\0' file; do
d=$(dirname "$file")
f=$(basename "$file")
echo $d: $f # Show directory and file for debug purposes
tar -rvf tarball.tar -C"$d" "$f"
done
I don't have a Solaris box at hand for testing :-)
First of all, my assumptions:
1. "one tar file", like you said, and
2. no absolute paths, ie if you backup ~/dir/file, you should be able to test extracting it in /tmp obtaining /tmp/dir/file.
If the problem is the full paths, you should replace
find ~ # etc
with
cd ~ || exit
find . # etc
If the tar archive isn't an absolute name, instead, it should be something like
(
cd ~ || exit
find . etc etc | xargs tar cf - etc etc
) > $3
Explanation
"(...)" runs a subshell, meaning some of the tings you change in there have no effects outside of the parens; the current directory is one of them, so "(cd whatever; foo)" means you run another shell, change its current directory, run foo from there, and then you're back in your script which never changed directory.
"cd ~ || exit" is paranoia, it means "cd ~; if that fails, exit".
"." is an alias meaning "the current directory, whatever that is"; play with "find ." vs "find ~" if you don't know what it means, you'll understand it better than if I explained it here.
"tar cf -" means that you create the tar archive on standard output; I think the syntax is portable enough, you may have to replace "-" with "/dev/stdout" or whatever works on solaris (the simplest solution is simply "tar", without the "c" command, but it's ugly to read).
The final "> $3", outside of the parens, is output redirection: rather than writing the output to the terminal, you save it into a file.
So the whole script reads like this:
- open a subshell
- change the subshell's current directory to ~
- in the subshell, find the files newer than requested, archive them, and write the contents of the resulting tar archive to standard output
- the subshell's stdout is saved to $3; because the redirection is outside the parens, relative paths are resolved relatively to your script's $PWD, meaning that eg if you run the script from the /tmp directory you'll get a tar archive in the /tmp directory (it would be in ~ if the redirection happened in the subshell).
If I misunderstood your question, the solution doesn't work or the explanation isn't clear let me know (the answer is too long, but I already know that :).
The pax command will output tar-compatible archives and has the flexibility you need to rewrite pathnames.
find ~ -name "$maska" -mtime -$dni | pax -w -x ustar -f "$3" -s '!.*/!!'
Here are what the options mean, paraphrasing from the man page:
-w write the contents of the file operands to the standard output (or to the pathname specified by the -f option) in an archive format.
-x ustar the output archive format is the extended tar interchange format specified in the IEEE POSIX standard.
-s '!.*/!!' Modifies file operands according to the substitution expression, using regular expression syntax. Here, it deletes all characters in each file name from the beginning to the final /.

Unix to find pdf files from list in text file

I have a directory (for Endnote) that is filled with PDF files (1000's of them). I have used Unix to print a list of all of the pdf files and saved this list as a text file. Most of these pdf files are located in other directories throughout my computer (duplicates).
Now, I want to use the find command to search for duplicates of these pdf files throughout the rest of my computer and if a duplicate is found, move it to a new directory. If a specific file name is found more than once, I want to give each a unique name (ie basename.pdf.1, basename.pdf.2 etc). At the end, I want a single directory for all duplicates so I can double check them and then delete).
However, I do not want find to search the directory in which my list was made from or my Dropbox, as I do not want to move these pdf files (only move the other pdfs scattered throughout my computer).
I have found (I think) how to do all of the individual steps that I need to complete this task, but I cannot seem to put everything together into a working Unix command.
1) In order to find files while excluding a directory:
find -name "what to search for" -not -path "excluded_directory"
or
find build -not \( -path excluded_directory1 -prune \) -not \( -path excluded_directory2 -prune \) -name \*.what_to_find
or my current favorite
find . -name '*.what_to_find' | grep -v exludeddir1 | grep -v excludeddir2
2) In order to read a text file into find and use the lines as search patterns:
find . type f -print | fgrep -f file_list.txt
3) to find and move files
find / -iname "*.what_to_find" -type f -exec mv {} /new_directory \;
or
find / -iname "*.what_to_find" -type f | xargs -I '{}' /new_directory
or (to rename files so files with same name are not just overwritten by each other). I haven't quite figured everything going on in this command out yet...
find -name '*.what_to_find' -type f -exec bash -c 'mv -v "$0" "./$( mktemp "$( basename "$0" ).XXX" )"' '{}' \;
So, I can execute this commands individually, but have not been able to get them to work together as desired (maybe my order of commands is wrong? other problems?).
find . type f -print | fgrep -f file_list.txt | grep -v excludeddir1 | grep -v excludeddir2 -exec bash -c 'echo mv -v "$0" "./$( mktemp "$( basename "$0" ).XXX" )"' '{}' \;
Any help is much appreciated!
Thanks,
Derrick
Well I wasn't able to complete this task exactly how I wanted to, but I found a work around that got the job done.
I printed a list of all PDFs I have in Endnote, then deleted the path name, leaving just the file names (find and replace function in text wrangler). I then used the find command to search this list against my computer, printing all occurances of each PDF.
Then in text wrangler, I deleted all lines containing the initial path to my endnote PDFs, leaving just the desired duplicates.
Next, I used the find command to search for these exact paths and move them to a new folder.
All In all, I got by with the exact same commands I have in my original post, and a little help from text wrangler. Unfortunately I never figured out how to combine all my desired steps into a single unix command.

Unix - Find command with tar and gz works well from command line but not working on Script

When i use this command on command line with hard coded values for the parameters $dir,$year,$month it works very well and creates the gzip file. I checked the contents of the gz file and i see the desired results.
find $dir -name '*_$year-$month-*' -type f -print | \
xargs tar -zvcf $dir/log.$year-$month.tar.gz
But when i embed this in a shell script and run, it won't run and gives me this error.
tar: Cowardly refusing to create an empty archive
First of all, your one-liner is full of quoting issues:
$dir will break if the directory name contains whitespace. You need "$dir" instead
Single quotes prevent variable expansion - '*_$year-$month-*' should probably be "*_$year-$month-*".
In your shell script code, find will not match any files (you don't have any filenames with the string _$year-$month-, do you?) and therefore tar will not be supplied with any files to include in the archive.
On a sidenote, using xargs in this particular case is dangerous - if you have too many files, xargs will call tar more than once and any files archived in all but the last run will be erased from the archive as it is overwritten.
Additionally, this command will also break on file paths with whitespace - by default xargs uses whitespace as the argument delimiter. Depending on the version of find and xargs binaries that you are using, there may be a -print0 option for find and a matching -0 option for xargs to deal with this issue:
find ... -print0 | xargs -0 ...
Finally, some xargs versions have an option to avoid calling the specified command if no arguments have been supplied - for GNU xargs that is the -r option:
find ... -print0 | xargs -0 -r ...
Variable substitution doesn't happen within single quotes.
Use double quotes: find "$dir" -name "*_$year-$month-*" ...
(You may use different kinds of quotes within the same argument if it's needed (not here): '*_'"$year")
If you can, attempt to echo the values of $dir, $year, $month from the shellscript onto the console. It is possible that those values are different in the shellscript.
Secodly, which find and tar is the shellscript using? Attempt to use the full paths for diagnostics or modify the value of $PATH to match what you get when you run the command from the command line.

Unix shell file copy flattening folder structure

On the UNIX bash shell (specifically Mac OS X Leopard) what would be the simplest way to copy every file having a specific extension from a folder hierarchy (including subdirectories) to the same destination folder (without subfolders)?
Obviously there is the problem of having duplicates in the source hierarchy. I wouldn't mind if they are overwritten.
Example: I need to copy every .txt file in the following hierarchy
/foo/a.txt
/foo/x.jpg
/foo/bar/a.txt
/foo/bar/c.jpg
/foo/bar/b.txt
To a folder named 'dest' and get:
/dest/a.txt
/dest/b.txt
In bash:
find /foo -iname '*.txt' -exec cp \{\} /dest/ \;
find will find all the files under the path /foo matching the wildcard *.txt, case insensitively (That's what -iname means). For each file, find will execute cp {} /dest/, with the found file in place of {}.
The only problem with Magnus' solution is that it forks off a new "cp" process for every file, which is not terribly efficient especially if there is a large number of files.
On Linux (or other systems with GNU coreutils) you can do:
find . -name "*.xml" -print0 | xargs -0 echo cp -t a
(The -0 allows it to work when your filenames have weird characters -- like spaces -- in them.)
Unfortunately I think Macs come with BSD-style tools. Anyone know a "standard" equivalent to the "-t" switch?
The answers above don't allow for name collisions as the asker didn't mind files being over-written.
I do mind files being over-written so came up with a different approach. Replacing each / in the path with - keep the hierarchy in the names, and puts all the files in one flat folder.
We use find to get the list of all files, then awk to create a mv command with the original filename and the modified filename then pass those to bash to be executed.
find ./from -type f | awk '{ str=$0; sub(/\.\//, "", str); gsub(/\//, "-", str); print "mv " $0 " ./to/" str }' | bash
where ./from and ./to are directories to mv from and to.
If you really want to run just one command, why not cons one up and run it? Like so:
$ find /foo -name '*.txt' | xargs echo | sed -e 's/^/cp /' -e 's|$| /dest|' | bash -sx
But that won't matter too much performance-wise unless you do this a lot or have a ton of files. Be careful of name collusions, however. I noticed in testing that GNU cp at least warns of collisions:
cp: will not overwrite just-created `/dest/tubguide.tex' with `./texmf/tex/plain/tugboat/tubguide.tex'
I think the cleanest is:
$ find /foo -name '*.txt' | xargs -i cp {} /dest
Less syntax to remember than the -exec option.
As far as the man page for cp on a FreeBSD box goes, there's no need for a -t switch. cp will assume the last argument on the command line to be the target directory if more than two names are passed.

Resources