Combine find, grep and xargs with printf - unix

I have a find command combined with exec grep and a printf option :
find -L /home/blast/dirtest -maxdepth 3 **-exec grep -q "pattern" {} \;** -printf '%y/#/%TY-%Tm-%Td %TX/#/%s/#/%f/#/%l/#/%h\n' 2> /dev/null
Result :
f/#/2018-01-01 10:00:00/#/191/#/filee.xml/#//#//home/blast/dirtest/01/05
I need the printf to get all the desired file informations at once (date, type size etc)
The above command works fine. But the exec option is too slow comparing to xargs.
I tryed to do the same with xarg but I did not succeed.
Any Idea on how to acheive that ? using the xargs command keeping the desired printf or similar .
Thanks

Your code is:
find -L /home/blast/dirtest -maxdepth 3 \
-exec grep -q "pattern" {} \; \
-printf '%y/#/%TY-%Tm-%Td %TX/#/%s/#/%f/#/%l/#/%h\n' 2> /dev/null
This invokes a new grep process for each file.
If you are using GNU utilities, you can reduce the number of grep processes by something like:
(
format=\''%y/#/%TY-%Tm-%Td %TX/#/%s/#/%f/#/%l/#/%h\n'\'
find -L /home/blast/dirtest -maxdepth 3 -print0 |\
xargs -0 grep -l -Z "pattern" |\
xargs -0 sh -c 'find "$#" -printf '"$format" --
) 2>/dev/null
for clarity, store the formatstring in a variable
use -print0 / -0 / -Z options to enable null-delimited data
generate initial filelist with find
filter on "pattern" with grep (use of xargs minimises the number of times grep gets called)
feed the filtered filelist into another xargs to run a minimal number of find -printf
in second xargs, call a subshell so that extra arguments can be appended (find requires the paths to precede the operators)
dummy second argument (--) to the sh -c invocation prevents the first filename being lost due to assignment to $0

To do it exactly how you want:
find -L /home/blast/dirtest/ -maxdepth 3 \
-printf '%p#%y/#/%TY-%Tm-%Td %TX/#/%s/#/%f/#/%l/#/%h\n' \
> tmp.out
cut -d# -f1 tmp.out \
| xargs grep -l "pattern" 2>/dev/null \
| sed 's/^/^/; s/$/#/' \
| grep -f /dev/stdin tmp.out \
| sed 's/^.*#//'
This operates under the assumption that you have no character # in your file names.
What it does is avoid the grep at first and just dump all the files with the requested metadata to a temporary file.
But it also prefixes each line with the full path (%p#).
Then we extract (cut) the full paths out of this list and list the files which contains the pattern (xargs grep).
We then use sed to prefix each such file name with ^ and suffix it with #, which makes it a greppable pattern in our tmp.out file.
Then we use this pattern (grep -f /dev/stdin) to extract only those paths from the big list in tmp.out.
Now all that's left is to remove the artificial full path we prefixed using the last sed command.
Seeing how you used /home, there's a good chance you're on Linux, which, if you're willing to accept some output format changes, allows you to do it somewhat more elegantly:
find -L /home/blast/dirtest/ -maxdepth 3 \
| xargs grep -l "pattern" 2>/dev/null \
| xargs stat --printf '%F/#/%y/#/%s/#/%n\n'
The output of stat --printf is different from that of find -printf (and from that of MacOS' stat -f), but it's the same information.
Do note, however, that because you passed -L to find, and you're grepping the result:
The results are limited to file types which can be grepped, so they will never be directories, links, etc..
If you stumble upon a broken link, it will not be in the output because it cannot be grepped.

I'v found an intresting thing about the -exec option.
We could run the grep once using the exec with the plus-sign (+)
-exec command {} +
This variant of the -exec option runs the specified command on the selected files, but the command line is built by appending each selected file name at the end; the total
number of invocations of the command will be much less than the number of matched files. The command line is built in much the same way that xargs builds its command
lines. Only one instance of ’{}’ is allowed within the command. The command is executed in the starting directory.
That means if I change this :
-exec grep -l 'pattern' {} \;
By this ( replace the semicolon with the plus signe ):
-exec grep -l 'pattern' {} \+
Will improve the performance significantly.
Then I can pipe only one xargs for the format printing needs only.

Related

Recursively finding files in list of directories

How do I recursively count files in a list of Linux directories?
Example:
/dog/
/a.txt
/b.txt
/c.ipynb
/cat/
/d.txt
/e.pdf
/f.png
/g.txt
/owl/
/h.txt
I want following output:
5 .txt
1 .pynb
1 .pdf
1 .png
I tried the following, with no luck.
find . -type f | sed -n 's/..*\.//p' | sort | uniq -c
This find + gawk may work for you:
find . -type f -print0 |
awk -v RS='\0' -F/ '{sub(/^.*\./, ".", $NF); ++freq[$NF]} END {for (i in freq) print freq[i], i}'
It is safe to use -print0 in find to handle files with whitespace and other special glob characters. Likewise we use -v RS='\0' in awk to ensure NUL byte is record seperator.
Use Perl one-liners to make the output in the format you need, like so:
find . -type f | perl -pe 's{.*[.]}{.}' | sort | uniq -c | perl -lane 'print join "\t", #F;' | sort -nr
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlrequick: Perl regular expressions quick start
Assume you have a known a directory path with the following subdirectories foo, bar, baz, qux, quux, gorge and we want to count the file types based on extension, but only for the subdirectories, foo, baz and qux
The best is to just do
$ find /path/{foo,baz,qux} -type f -exec sh -c 'echo "${0##*.}"' {} \; | sort | uniq -c
The exec part just uses a simple sh variable substitution to print the extension.

Append "/" to end of directory

Completely noob question but, using ls piped to grep, I need to find files or directories that have all capitals in their name, and directories need to have "/" appended to indicate that it is a directory. Trying to append the "/" is the only part I am stuck on. Again, I apologize for the amateur question. I currently have ls | grep [A-Z] and the example out should be: BIRD, DOG, DOGDIR/
It's an interesting question because it's a somewhat difficult thing to accomplish with a bash one-liner.
Here's what I came up with. It doesn't seem very elegant, but I'm not sure how to improve.
find /animals -type d -or -type f \
| grep '/[A-Z]*$' \
| xargs -I + bash -c 'echo -n $(basename +)$( test -d + && echo -n /),\\ ' \
| sed -e 's/, *$//'; echo
I'll break that down for you
find /animals -type d -or -type f writes out, once per line, the directories and files it found in /animals (see below for my test environment dockerfile - I created /animals to match your desired output). Find can't do a regex match as far as I know on the name, so...
grep '/[A-Z]*$' filter's find's output so that only paths are shown where the last part of the file or directory name, after the final /, is all uppercase
xargs -I + bash -c '...' when you're in a shell and you want to use a "for" loop, chances are what you should be using is xargs. Learn it, know it, love it. xargs takes its input, separated by default by $IFS, and runs the command you give it for each piece of input . So this is going to run a bash shell for each path. that passed the grep filter. In my case, -I + will make xargs replace the literal '+' character with its current input filename. -I also makes it pass one at a time through xargs. For more information, see the xargs manual page.
'echo -n $(basename +)$( test -d + && echo -n /),\\ ' this is the inner bash script that will be run by xargs for each path that got through grep.
basename + cuts the directory component off the path; from your example output you don't want eg /animals/DOGDIR/, you want DOGDIR/. basename is the program that trims the directories for us.
test -d + && echo -n / checks to see whether + (remember xargs will replace it with filename) is a directory ,and if so, runs echo -n /. the -n argument to echo suppresses the newline, important to get the output in the CSV format you specified.
now we can put it all together to see that we're echo -n the output of basename + , with / appended, if it's a directory, and then , appended to that. All the echos run with -n to suppress newlines to keep output CSV looking.
| sed -e 's/, *$//'; echo is purely for formatting. Adding , to each individual output was an easy way to get the CSV, but it leaves us with a final , at the end of the list. The sed invocation removes , followed by any number of spaces at the end of the output so far - eg the entire output from all the xargs invocations. And since we never did output a newline at the end of that output, the final echo is adding that.
Usually in unix shells, you probably wouldn't want a CSV style output. You'd probably instead want a newline-separated output in most cases, one matching file per line, and that would be somewhat simpler to do because you wouldn't need all that faffing with -n and , to make it CSV style. But, valid requirement if the need is there.
FROM debian
RUN mkdir -p /animals
WORKDIR /animals
RUN mkdir -p DOGDIR lowerdir && touch DOGDIR/DOG DOGDIR/lowerDOG2 lowerdir/BIRD
ENTRYPOINT [ "/bin/bash" ]
CMD [ "-c" , "find /animals -type d -or -type f | grep '/[A-Z]*$'| xargs -I + bash -c 'echo -n $(basename +)$( test -d + && echo -n /),\\ ' | sed -e 's/, *$//'; echo"]
$ docker run --rm test
BIRD, DOGDIR/, DOG
You can start looking at
ls -F | grep -v "[[:lower:]]"
I did not add something for a comma-seperated line, because this is the wrong method: Parsing ls should be avoided ! It will go wrong for filenames like
I am a terribble filename,
with newlines inside me,
and the ls command combined with grep
will only show the last line
BECAUSE THIS LINE HAS NO LOWERCASE CHARACTERS
To get the files without a pipe, you can use
shopt -s extglob
ls -dp +([[:upper:]])
shopt -u extglob
An explanation of the extglob and uppercase can be found at https://unix.stackexchange.com/a/389071/57293
When you want the output in one line, you can get troubles with filenames that have newlines or commas in its name. You might want something like
# parsing ls, yes wrong and failing for some files
ls -dp +([[:upper:]]) | tr "\n" "," | sed 's/,$/\n/'

find and then grep and then iterate through list of files

I have following script to replace text.
grep -l -r "originaltext" . |
while read fname
do
sed 's/originaltext/replacementText/g' $fname > tmp.tmp
mv tmp.tmp $fname
done
Now in the first statement of this script , I want to do something like this.
find . -name '*.properties' -exec grep "originaltext" {} \;
How do I do that?
I work on AIX, So --include-file wouldn't work .
In general, I prefer to use find to FIND files rather than grep. It looks obvious : )
Using process substitution you can feed the while loop with the result of find:
while IFS= read -r fname
do
sed 's/originaltext/replacementText/g' $fname > tmp.tmp
mv tmp.tmp $fname
done < <(find . -name '*.properties' -exec grep -l "originaltext" {} \;)
Note I use grep -l (big L) so that grep just returns the name of the file matching the pattern.
You could go the other way round and give the list of '*.properties' files to grep. For example
grep -l "originaltext" `find -name '*.properties'`
Oh, and if you're on a recent linux distribution, there is an option in grep to achieve that without having to create that long list of files as argument
grep -l "originaltext" --include='*.properties' -r .

How to grep for files containing a specific word and pass the list of files as argument to second command?

grep rli "stringName" * | xargs <second_command> <list_of files>
will the above code work for the functionality mentioned?
I am a beginner to not sure how to use it.
You are just missing hyphen for options to grep. Following should work
grep -rli "stringName" * | xargs <second_command>
Considering above command cannot handle whitespace or weird characters in file names, more robust solution would be to use find
find . -type f -exec grep -qi "stringName" {} + -print0 | xargs -0 <second_command>
Or use -Z option with xargs -0
grep -rli "stringName" * -Z | xargs -0 <second_command>
Extending on jkshah's answer, which is already quite good.
find . -type f -exec grep -qi "regex" {} \; -exec "second_command" {} \;
This has the advantage of being more portable (-print0 and -0 are gnu extensions).
It executes the second command for each matching file in turn. If you want to execute with a list of all matching files at the end instead, change the last \; to +

How to copy files in shell that do not end with a certain file extension

For example copy all files that do not end with .txt
Bash will accept a not pattern.
cp !(*.txt)
You can use ls with grep -v option:
for i in `ls | grep -v ".txt"`
do
cp $i $dest_dir
done
Depending on how many assumptions you can afford to make about the characters in the file names, it might be as simple as:
cp $(ls | grep -v '\.txt$') /some/other/place
If that won't work for you, then maybe find ... -print0 | xargs -0 cp ... can be used instead (though that has issues - because the destination goes at the end of the argument list).
On MacOS X, xargs has an option -J that supports what is needed:
-J replstr
If this option is specified, xargs will use the data read from standard input to replace the first occurrence of replstr instead of append-
ing that data after all other arguments. This option will not affect how many arguments will be read from input (-n), or the size of the
command(s) xargs will generate (-s). The option just moves where those arguments will be placed in the command(s) that are executed. The
replstr must show up as a distinct argument to xargs. It will not be recognized if, for instance, it is in the middle of a quoted string.
Furthermore, only the first occurrence of the replstr will be replaced. For example, the following command will copy the list of files and
directories which start with an uppercase letter in the current directory to destdir:
/bin/ls -1d [A-Z]* | xargs -J % cp -rp % destdir
It appears the GNU xargs does not have -J but does have the related but slightly restrictive -I option (which is also present in MacOS X):
-I replace-str
Replace occurrences of replace-str in the initial-arguments with
names read from standard input. Also, unquoted blanks do not
terminate input items; instead the separator is the newline
character. Implies -x and -L 1.
You can rely on:
find . -not -name "*.txt"
By using:
find -x . -not -name "*.txt" -d 1 -exec cp '{}' toto/ \;`
Which copies all file that are not .txt of the current directory to a subdirectory toto/. the -d 1 is used to prevent recursion here.
Either do:
for f in $(ls | grep -v "\.txt$")
do
cp -- "$f" ⟨destination-directory⟩
done
or if you have a huge amount of files:
find -prune \! -name "*.txt" -exec cp -- "{}" ⟨destination-directory⟩ .. \;
Two things here to comment on. One is the use of the double hyphen in the invocation of cp, and the quoting of $f. The first guards against "wacky" filenames that begin with a hyphen and might be interpreted as options. The second guards agains filenames with spaces (or what's in IFS) in them.
In zsh:
setopt extendedglob
cp *^.txt /some/folder
(if you just want files)...
cp *.^txt(.) /some/folder
More information on zsh globbing here and here.
I would do it like this, where destination is the destination directory:
ls | grep -v "\.txt$" | xargs cp -t destination
Edit: added "-t" thanks to the comments

Resources