Unix - Find command with tar and gz works well from command line but not working on Script - unix

When i use this command on command line with hard coded values for the parameters $dir,$year,$month it works very well and creates the gzip file. I checked the contents of the gz file and i see the desired results.
find $dir -name '*_$year-$month-*' -type f -print | \
xargs tar -zvcf $dir/log.$year-$month.tar.gz
But when i embed this in a shell script and run, it won't run and gives me this error.
tar: Cowardly refusing to create an empty archive

First of all, your one-liner is full of quoting issues:
$dir will break if the directory name contains whitespace. You need "$dir" instead
Single quotes prevent variable expansion - '*_$year-$month-*' should probably be "*_$year-$month-*".
In your shell script code, find will not match any files (you don't have any filenames with the string _$year-$month-, do you?) and therefore tar will not be supplied with any files to include in the archive.
On a sidenote, using xargs in this particular case is dangerous - if you have too many files, xargs will call tar more than once and any files archived in all but the last run will be erased from the archive as it is overwritten.
Additionally, this command will also break on file paths with whitespace - by default xargs uses whitespace as the argument delimiter. Depending on the version of find and xargs binaries that you are using, there may be a -print0 option for find and a matching -0 option for xargs to deal with this issue:
find ... -print0 | xargs -0 ...
Finally, some xargs versions have an option to avoid calling the specified command if no arguments have been supplied - for GNU xargs that is the -r option:
find ... -print0 | xargs -0 -r ...

Variable substitution doesn't happen within single quotes.
Use double quotes: find "$dir" -name "*_$year-$month-*" ...
(You may use different kinds of quotes within the same argument if it's needed (not here): '*_'"$year")

If you can, attempt to echo the values of $dir, $year, $month from the shellscript onto the console. It is possible that those values are different in the shellscript.
Secodly, which find and tar is the shellscript using? Attempt to use the full paths for diagnostics or modify the value of $PATH to match what you get when you run the command from the command line.

Related

Is there a method to enter every subdirectory in a directory and perform analysis on file with certain extension?

I have one directory, with multiple subdirectories. In each subdirectory there is a file on which I want to perform analysis (code already written).
Common for all subdirectories is that they have file with same extension on which analysis should be performed.
Using Unix shell, is there a way to write a commands which will:
for each subdirectory in main directory, use file with certain extension and perform further commands on that file (further commands include creation of some new directories and files)
repeat it for all subdirectories in main directory and files inside them
I will appreciate all suggestions.
Use the find command. find . -type f -name '*.txt' -exec prog \{} \; will execute program prog with the name of every file in the current directory . and below with the extension .txt (i.e. that matches the pattern *.txt). The -type f excludes directories (and pipes and devices). The -exec means execute this command; the \{} will be replaced with the filename; \; means end of command.
This definitely works if your filenames have no spaces, quote marks, or backslashes in them. If they do, it gets a little trickier: find . -type f -name '*.txt' -print0 | xargs -0 -n1 prog, assuming the filename argument goes at the end of the line. The -print0 means output the file with null termination (zero character) and the -0 means input with null termination. xargs takes its input and invokes prog for every null-terminated word. -n1 means only use one argument per invocation; you can omit it if the program accepts multiple filenames as arguments. You can use -i if you need to insert text after the argument.
Note: I am aware that using -exec for various obscure reasons may not be preferable for, say, secure system shell scripts, but for a use case like this it is fine.

Omit "Is a directory" results while using find command in Unix

I use the following command to find a string recursively within a directory structure.
find . -exec grep -l samplestring {} \;
But when I run the command within a large directory structure, there will be a long list of
grep: ./xxxx/xxxxx_yy/eee: Is a directory
grep: ./xxxx/xxxxx_yy/eee/local: Is a directory
grep: ./xxxx/xxxxx_yy/eee/lib: Is a directory
I want to omit those above results. And just get the file name with the string displayed. can someone help?
grep -s or grep --no-messages
It is worth reading the portability notes in the GNU grep documentation if you are hoping to use this code multiple places, though:
-s
--no-messages
Suppress error messages about nonexistent or unreadable files. Portability note: unlike GNU grep, 7th Edition Unix grep did not conform to POSIX, because it lacked -q and its -s option behaved like GNU grep’s -q option.1 USG-style grep also lacked -q but its -s option behaved like GNU grep’s. Portable shell scripts should avoid both -q and -s and should redirect standard and error output to /dev/null instead. (-s is specified by POSIX.)
Whenever you are saying find ., the utility is going to return all the elements within your current directory structure: files, directories, links...
If you just want to find files, just say so!
find . -type f -exec grep -l samplestring {} \;
# ^^^^^^^
However, you may want to find all files containing a string saying:
grep -lR "samplestring"
Exclude directory warnings in grep with the --exclude-dir option:
grep --exclude-dir=* 'search-term' *
Just look at the grep --help page:
--exclude-dir=PATTERN directories that match PATTERN will be skipped.

Searching code files for a particular string

Im using Ubuntu Karmic as my operating system . I frequently need to search my project folder for a particular string, to see if its there in any of the files in the project folder or its subfolders.
I currently use the find command to do that, and have written a script that accepts the string im looking for as the parameter.
find . -exec grep -l $1 {} \;
But the problem with this is that it does not work with strings having a space in them. So, is there any way to search for space separated strings as well, or is there any available tool that does the job ?
Thank You.
How are you invoking your script?
If you want to search for space separated strings you need to do the
invocation in the form:
%./script_name.sh 'search string'
and also change the find invocation to :
find . -exec grep -l "$1" {} \;
A better version of that command is simply grep -rl "$1" ., or possibly grep -rl "$*" ..
If your string contains the correct amount of space, and the problem is simply the shell parsing the arguments, then you can refer to every arg with "$*" and you can prevent the shell from breaking at word boundaries (but still allow parameter expansion) by using the soft double-quotes.
grep -R "phrase with spaces" /folder/folder
find / -type d -name "folder name" 2> /dev/null
I believe you may simply do grep -lR ${1} . to achieve what you need.

Use grep --exclude/--include syntax to not grep through certain files

I'm looking for the string foo= in text files in a directory tree. It's on a common Linux machine, I have bash shell:
grep -ircl "foo=" *
In the directories are also many binary files which match "foo=". As these results are not relevant and slow down the search, I want grep to skip searching these files (mostly JPEG and PNG images). How would I do that?
I know there are the --exclude=PATTERN and --include=PATTERN options, but what is the pattern format? The man page of grep says:
--include=PATTERN Recurse in directories only searching file matching PATTERN.
--exclude=PATTERN Recurse in directories skip file matching PATTERN.
Searching on grep include, grep include exclude, grep exclude and variants did not find anything relevant
If there's a better way of grepping only in certain files, I'm all for it; moving the offending files is not an option. I can't search only certain directories (the directory structure is a big mess, with everything everywhere). Also, I can't install anything, so I have to do with common tools (like grep or the suggested find).
Use the shell globbing syntax:
grep pattern -r --include=\*.cpp --include=\*.h rootdir
The syntax for --exclude is identical.
Note that the star is escaped with a backslash to prevent it from being expanded by the shell (quoting it, such as --include="*.cpp", would work just as well). Otherwise, if you had any files in the current working directory that matched the pattern, the command line would expand to something like grep pattern -r --include=foo.cpp --include=bar.cpp rootdir, which would only search files named foo.cpp and bar.cpp, which is quite likely not what you wanted.
Update 2021-03-04
I've edited the original answer to remove the use of brace expansion, which is a feature provided by several shells such as Bash and zsh to simplify patterns like this; but note that brace expansion is not POSIX shell-compliant.
The original example was:
grep pattern -r --include=\*.{cpp,h} rootdir
to search through all .cpp and .h files rooted in the directory rootdir.
If you just want to skip binary files, I suggest you look at the -I (upper case i) option. It ignores binary files. I regularly use the following command:
grep -rI --exclude-dir="\.svn" "pattern" *
It searches recursively, ignores binary files, and doesn't look inside Subversion hidden folders, for whatever pattern I want. I have it aliased as "grepsvn" on my box at work.
Please take a look at ack, which is designed for exactly these situations. Your example of
grep -ircl --exclude=*.{png,jpg} "foo=" *
is done with ack as
ack -icl "foo="
because ack never looks in binary files by default, and -r is on by default. And if you want only CPP and H files, then just do
ack -icl --cpp "foo="
grep 2.5.3 introduced the --exclude-dir parameter which will work the way you want.
grep -rI --exclude-dir=\.svn PATTERN .
You can also set an environment variable: GREP_OPTIONS="--exclude-dir=\.svn"
I'll second Andy's vote for ack though, it's the best.
I found this after a long time, you can add multiple includes and excludes like:
grep "z-index" . --include=*.js --exclude=*js/lib/* --exclude=*.min.js
The suggested command:
grep -Ir --exclude="*\.svn*" "pattern" *
is conceptually wrong, because --exclude works on the basename. Put in other words, it will skip only the .svn in the current directory.
In grep 2.5.1 you have to add this line to ~/.bashrc or ~/.bash profile
export GREP_OPTIONS="--exclude=\*.svn\*"
I find grepping grep's output to be very helpful sometimes:
grep -rn "foo=" . | grep -v "Binary file"
Though, that doesn't actually stop it from searching the binary files.
If you are not averse to using find, I like its -prune feature:
find [directory] \
-name "pattern_to_exclude" -prune \
-o -name "another_pattern_to_exclude" -prune \
-o -name "pattern_to_INCLUDE" -print0 \
| xargs -0 -I FILENAME grep -IR "pattern" FILENAME
On the first line, you specify the directory you want to search. . (current directory) is a valid path, for example.
On the 2nd and 3rd lines, use "*.png", "*.gif", "*.jpg", and so forth. Use as many of these -o -name "..." -prune constructs as you have patterns.
On the 4th line, you need another -o (it specifies "or" to find), the patterns you DO want, and you need either a -print or -print0 at the end of it. If you just want "everything else" that remains after pruning the *.gif, *.png, etc. images, then use
-o -print0 and you're done with the 4th line.
Finally, on the 5th line is the pipe to xargs which takes each of those resulting files and stores them in a variable FILENAME. It then passes grep the -IR flags, the "pattern", and then FILENAME is expanded by xargs to become that list of filenames found by find.
For your particular question, the statement may look something like:
find . \
-name "*.png" -prune \
-o -name "*.gif" -prune \
-o -name "*.svn" -prune \
-o -print0 | xargs -0 -I FILES grep -IR "foo=" FILES
On CentOS 6.6/Grep 2.6.3, I have to use it like this:
grep "term" -Hnir --include \*.php --exclude-dir "*excluded_dir*"
Notice the lack of equal signs "=" (otherwise --include, --exclude, include-dir and --exclude-dir are ignored)
git grep
Use git grep which is optimized for performance and aims to search through certain files.
By default it ignores binary files and it is honoring your .gitignore. If you're not working with Git structure, you can still use it by passing --no-index.
Example syntax:
git grep --no-index "some_pattern"
For more examples, see:
How to exclude certain directories/files from git grep search.
Check if all of multiple strings or regexes exist in a file
I'm a dilettante, granted, but here's how my ~/.bash_profile looks:
export GREP_OPTIONS="-orl --exclude-dir=.svn --exclude-dir=.cache --color=auto" GREP_COLOR='1;32'
Note that to exclude two directories, I had to use --exclude-dir twice.
If you search non-recursively you can use glop patterns to match the filenames.
grep "foo" *.{html,txt}
includes html and txt. It searches in the current directory only.
To search in the subdirectories:
grep "foo" */*.{html,txt}
In the subsubdirectories:
grep "foo" */*/*.{html,txt}
In the directories are also many binary files. I can't search only certain directories (the directory structure is a big mess). Is there's a better way of grepping only in certain files?
ripgrep
This is one of the quickest tools designed to recursively search your current directory. It is written in Rust, built on top of Rust's regex engine for maximum efficiency. Check the detailed analysis here.
So you can just run:
rg "some_pattern"
It respect your .gitignore and automatically skip hidden files/directories and binary files.
You can still customize include or exclude files and directories using -g/--glob. Globbing rules match .gitignore globs. Check man rg for help.
For more examples, see: How to exclude some files not matching certain extensions with grep?
On macOS, you can install via brew install ripgrep.
find and xargs are your friends. Use them to filter the file list rather than grep's --exclude
Try something like
find . -not -name '*.png' -o -type f -print | xargs grep -icl "foo="
The advantage of getting used to this, is that it is expandable to other use cases, for example to count the lines in all non-png files:
find . -not -name '*.png' -o -type f -print | xargs wc -l
To remove all non-png files:
find . -not -name '*.png' -o -type f -print | xargs rm
etc.
As pointed out in the comments, if some files may have spaces in their names, use -print0 and xargs -0 instead.
Try this one:
$ find . -name "*.txt" -type f -print | xargs file | grep "foo=" | cut -d: -f1
Founded here: http://www.unix.com/shell-programming-scripting/42573-search-files-excluding-binary-files.html
those scripts don't accomplish all the problem...Try this better:
du -ha | grep -i -o "\./.*" | grep -v "\.svn\|another_file\|another_folder" | xargs grep -i -n "$1"
this script is so better, because it uses "real" regular expressions to avoid directories from search. just separate folder or file names with "\|" on the grep -v
enjoy it!
found on my linux shell! XD
Look # this one.
grep --exclude="*\.svn*" -rn "foo=" * | grep -v Binary | grep -v tags
The --binary-files=without-match option to GNU grep gets it to skip binary files. (Equivalent to the -I switch mentioned elsewhere.)
(This might require a recent version of grep; 2.5.3 has it, at least.)
suitable for tcsh .alias file:
alias gisrc 'grep -I -r -i --exclude="*\.svn*" --include="*\."{mm,m,h,cc,c} \!* *'
Took me a while to figure out that the {mm,m,h,cc,c} portion should NOT be inside quotes.
~Keith
To ignore all binary results from grep
grep -Ri "pattern" * | awk '{if($1 != "Binary") print $0}'
The awk part will filter out all the Binary file foo matches lines
Try this:
Create a folder named "--F" under currdir ..(or link another folder there renamed to "--F" ie double-minus-F.
#> grep -i --exclude-dir="\-\-F" "pattern" *

Unix shell file copy flattening folder structure

On the UNIX bash shell (specifically Mac OS X Leopard) what would be the simplest way to copy every file having a specific extension from a folder hierarchy (including subdirectories) to the same destination folder (without subfolders)?
Obviously there is the problem of having duplicates in the source hierarchy. I wouldn't mind if they are overwritten.
Example: I need to copy every .txt file in the following hierarchy
/foo/a.txt
/foo/x.jpg
/foo/bar/a.txt
/foo/bar/c.jpg
/foo/bar/b.txt
To a folder named 'dest' and get:
/dest/a.txt
/dest/b.txt
In bash:
find /foo -iname '*.txt' -exec cp \{\} /dest/ \;
find will find all the files under the path /foo matching the wildcard *.txt, case insensitively (That's what -iname means). For each file, find will execute cp {} /dest/, with the found file in place of {}.
The only problem with Magnus' solution is that it forks off a new "cp" process for every file, which is not terribly efficient especially if there is a large number of files.
On Linux (or other systems with GNU coreutils) you can do:
find . -name "*.xml" -print0 | xargs -0 echo cp -t a
(The -0 allows it to work when your filenames have weird characters -- like spaces -- in them.)
Unfortunately I think Macs come with BSD-style tools. Anyone know a "standard" equivalent to the "-t" switch?
The answers above don't allow for name collisions as the asker didn't mind files being over-written.
I do mind files being over-written so came up with a different approach. Replacing each / in the path with - keep the hierarchy in the names, and puts all the files in one flat folder.
We use find to get the list of all files, then awk to create a mv command with the original filename and the modified filename then pass those to bash to be executed.
find ./from -type f | awk '{ str=$0; sub(/\.\//, "", str); gsub(/\//, "-", str); print "mv " $0 " ./to/" str }' | bash
where ./from and ./to are directories to mv from and to.
If you really want to run just one command, why not cons one up and run it? Like so:
$ find /foo -name '*.txt' | xargs echo | sed -e 's/^/cp /' -e 's|$| /dest|' | bash -sx
But that won't matter too much performance-wise unless you do this a lot or have a ton of files. Be careful of name collusions, however. I noticed in testing that GNU cp at least warns of collisions:
cp: will not overwrite just-created `/dest/tubguide.tex' with `./texmf/tex/plain/tugboat/tubguide.tex'
I think the cleanest is:
$ find /foo -name '*.txt' | xargs -i cp {} /dest
Less syntax to remember than the -exec option.
As far as the man page for cp on a FreeBSD box goes, there's no need for a -t switch. cp will assume the last argument on the command line to be the target directory if more than two names are passed.

Resources