Unix : run 2 commands which take input from a pipe? - unix

Updated :
Thanks for your answers. As I said, my q was just a translation of my usecase.
Let me get into more details of what I want to achieve.
In my dev env, we use "ade" as our version control system what I want to do is :
ade describetrans | awk '/myapps/{ print $2 }' | sort -fr | xargs -iF ade unbranch F
Now, every single time I run the unbranch command, a new file/dir gets checked out. so I need to run ade checkin -all after all my unbranch commands. So I needed something like
"pre part till sort" | xargs -iF (ade unbranch + ade checkin -all)
Any way to run 2 commands on the op of pipe ?
Thanks
Original question asked :
I can translate the usecase I have into the following :
I need to get the 1st line of a file. I do
cat file | head -1
Now I want to do this on a list of files. How do I do the following in one unix command ??
Eg :
find . -name "*.log" | ( xargs -iF cat F | head -1 )
Obviously the brackets in the above command do not work.
is there a way to pipe the output of the find command and do 2 commands on it ( cat and head ) ? Tried using ; and && but dint help.
I can create a script - but wanted to do this in one command.
Again - this is just a translation of the case I have.
thanks
Rohan

First of all, head accepts more than one file name, so you can simply write:
head -q -n 1 *.log
or:
find . -name '*.log' -exec head -n 1 '{}' ';'
However, if you really need to duplicate the stream, then use tee and do something along the lines of:
wget -O - http://example.com/dvd.iso | tee >(sha1sum > dvd.sha1) > dvd.iso
This example is taken from info coreutils 'tee invocation'.
UPDATE (Following the ade comment) In your case tee will not work. You need to perform a task after another task finishes, and tee will trigger the tasks more or less simultaneously (modulo buffering).
Another approach will work, provided that there are no spaces in the input lines (a serious issue, I know, but I'm not sure how to overcome it right now). I'll start with a generic solution:
echo -e 'Foo\nBar\nBaz' | ( for i in `cat` ; do echo 1$i ; echo 2$i ; done )
Here, echo -e 'Foo\nBar\nBaz' creates a sample multi-line input, echo 1$i is the first command to run, echo 2$i is the second command to run.
I'm not familiar with ade and I don't have it installed on my system, but I'm guessing that in your case something like this might work:
ade describetrans | awk '/myapps/{ print $2 }' | sort -fr | ( for i in `cat` ; do ade unbranch $i ; ade checkin -all $i ; done )

Useless use of cat. Simply pass the file names to head:
find . -name '*.log' -exec head -n 1 '{}' '+'
or
find . -name '*.log' -print0 | xargs -0 head -n 1
EDIT: This will print headers for each file. On Linux, this can be suppressed with -q, but on other systems you must make sure that head is called with a single argument:
find . -name '*.log' -exec head -n 1 '{}' ';'

You're making this much more complicated than it needs to be, by piping input into head rather than simply giving it a filename(s):
head -n1 `find -X . -name '*.log' | xargs`
If you don't need to traverse subdirectories, you can make it even simpler:
head -n1 *.log
You can screen out the filename headers by piping through grep: | grep -v '^(==> .* <==)?$' | grep -v '^$'

Try
ade describetrans | awk '/myapps/{ print $2 }' | sort -fr | xargs sh -c 'ade unbranch "$#"; ade checkin -all "$#"' arg0
This is assuming that ade accepts multiple files at once and ade checkin -all needs to be called for every file.
The string arg0 supplies the value of $0 in the -c string.

I do not know ade, so I will have to guess what the 2 commands want
to run really are. But if you have GNU Parallel
http://www.gnu.org/software/parallel/ installed one of these should
work:
ade describetrans | awk '/myapps/{ print $2 }' | sort -fr |
parallel -j1 ade unbranch {} ";" ade checkin -all {}
ade describetrans | awk '/myapps/{ print $2 }' | sort -fr |
parallel -j1 ade unbranch {} ";" ade checkin -all
ade describetrans | awk '/myapps/{ print $2 }' | sort -fr |
parallel -j1 ade unbranch {} ; ade checkin -all
If ade can be run in parallel, you can remove -j1.
Watch the intro video for GNU Parallel to learn more:
http://www.youtube.com/watch?v=OpaiGYxkSuQ

Related

Recursively finding files in list of directories

How do I recursively count files in a list of Linux directories?
Example:
/dog/
/a.txt
/b.txt
/c.ipynb
/cat/
/d.txt
/e.pdf
/f.png
/g.txt
/owl/
/h.txt
I want following output:
5 .txt
1 .pynb
1 .pdf
1 .png
I tried the following, with no luck.
find . -type f | sed -n 's/..*\.//p' | sort | uniq -c
This find + gawk may work for you:
find . -type f -print0 |
awk -v RS='\0' -F/ '{sub(/^.*\./, ".", $NF); ++freq[$NF]} END {for (i in freq) print freq[i], i}'
It is safe to use -print0 in find to handle files with whitespace and other special glob characters. Likewise we use -v RS='\0' in awk to ensure NUL byte is record seperator.
Use Perl one-liners to make the output in the format you need, like so:
find . -type f | perl -pe 's{.*[.]}{.}' | sort | uniq -c | perl -lane 'print join "\t", #F;' | sort -nr
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlrequick: Perl regular expressions quick start
Assume you have a known a directory path with the following subdirectories foo, bar, baz, qux, quux, gorge and we want to count the file types based on extension, but only for the subdirectories, foo, baz and qux
The best is to just do
$ find /path/{foo,baz,qux} -type f -exec sh -c 'echo "${0##*.}"' {} \; | sort | uniq -c
The exec part just uses a simple sh variable substitution to print the extension.

Finding and sorting files by size in Unix

I want to create a function in shell programming that gets 2 parameters, directory-name and file-name and that does the following: it searches starting in the given directory-name for the file-name and then goes in all subdirectories of the directory-name to continue the search. I want the output to be every parent-directory where the file-name has been found, sorted using the file-name size.
Help would be much appreciated, thanks.
not sure about which Unix you asked for, but for Linux and maybe common Unix systems:
find <directory> -name "<filename>" -ls | sort -k 7 -n -r | awk '{print $NF}' | xargs -n 1 dirname
sort => sort by file size (the 7th column of find output is filesize)
awk => print the filename full path
dirname => get parent directory of the matched file
Example:
# Find parent directory of all types.h under /usr/include, sorted by file size in desc order
$ find /usr/include/ -name "types.h" -ls | sort -k 7 -n -r | awk '{print $NF}' | xargs -n 1 dirname
/usr/include/x86_64-linux-gnu/bits
/usr/include/x86_64-linux-gnu/sys
/usr/include/c++/7/parallel
/usr/include/rpc
/usr/include/linux/sched
/usr/include/linux/iio
/usr/include/linux
/usr/include/asm-generic
/usr/include/x86_64-linux-gnu/asm

Combine find, grep and xargs with printf

I have a find command combined with exec grep and a printf option :
find -L /home/blast/dirtest -maxdepth 3 **-exec grep -q "pattern" {} \;** -printf '%y/#/%TY-%Tm-%Td %TX/#/%s/#/%f/#/%l/#/%h\n' 2> /dev/null
Result :
f/#/2018-01-01 10:00:00/#/191/#/filee.xml/#//#//home/blast/dirtest/01/05
I need the printf to get all the desired file informations at once (date, type size etc)
The above command works fine. But the exec option is too slow comparing to xargs.
I tryed to do the same with xarg but I did not succeed.
Any Idea on how to acheive that ? using the xargs command keeping the desired printf or similar .
Thanks
Your code is:
find -L /home/blast/dirtest -maxdepth 3 \
-exec grep -q "pattern" {} \; \
-printf '%y/#/%TY-%Tm-%Td %TX/#/%s/#/%f/#/%l/#/%h\n' 2> /dev/null
This invokes a new grep process for each file.
If you are using GNU utilities, you can reduce the number of grep processes by something like:
(
format=\''%y/#/%TY-%Tm-%Td %TX/#/%s/#/%f/#/%l/#/%h\n'\'
find -L /home/blast/dirtest -maxdepth 3 -print0 |\
xargs -0 grep -l -Z "pattern" |\
xargs -0 sh -c 'find "$#" -printf '"$format" --
) 2>/dev/null
for clarity, store the formatstring in a variable
use -print0 / -0 / -Z options to enable null-delimited data
generate initial filelist with find
filter on "pattern" with grep (use of xargs minimises the number of times grep gets called)
feed the filtered filelist into another xargs to run a minimal number of find -printf
in second xargs, call a subshell so that extra arguments can be appended (find requires the paths to precede the operators)
dummy second argument (--) to the sh -c invocation prevents the first filename being lost due to assignment to $0
To do it exactly how you want:
find -L /home/blast/dirtest/ -maxdepth 3 \
-printf '%p#%y/#/%TY-%Tm-%Td %TX/#/%s/#/%f/#/%l/#/%h\n' \
> tmp.out
cut -d# -f1 tmp.out \
| xargs grep -l "pattern" 2>/dev/null \
| sed 's/^/^/; s/$/#/' \
| grep -f /dev/stdin tmp.out \
| sed 's/^.*#//'
This operates under the assumption that you have no character # in your file names.
What it does is avoid the grep at first and just dump all the files with the requested metadata to a temporary file.
But it also prefixes each line with the full path (%p#).
Then we extract (cut) the full paths out of this list and list the files which contains the pattern (xargs grep).
We then use sed to prefix each such file name with ^ and suffix it with #, which makes it a greppable pattern in our tmp.out file.
Then we use this pattern (grep -f /dev/stdin) to extract only those paths from the big list in tmp.out.
Now all that's left is to remove the artificial full path we prefixed using the last sed command.
Seeing how you used /home, there's a good chance you're on Linux, which, if you're willing to accept some output format changes, allows you to do it somewhat more elegantly:
find -L /home/blast/dirtest/ -maxdepth 3 \
| xargs grep -l "pattern" 2>/dev/null \
| xargs stat --printf '%F/#/%y/#/%s/#/%n\n'
The output of stat --printf is different from that of find -printf (and from that of MacOS' stat -f), but it's the same information.
Do note, however, that because you passed -L to find, and you're grepping the result:
The results are limited to file types which can be grepped, so they will never be directories, links, etc..
If you stumble upon a broken link, it will not be in the output because it cannot be grepped.
I'v found an intresting thing about the -exec option.
We could run the grep once using the exec with the plus-sign (+)
-exec command {} +
This variant of the -exec option runs the specified command on the selected files, but the command line is built by appending each selected file name at the end; the total
number of invocations of the command will be much less than the number of matched files. The command line is built in much the same way that xargs builds its command
lines. Only one instance of ’{}’ is allowed within the command. The command is executed in the starting directory.
That means if I change this :
-exec grep -l 'pattern' {} \;
By this ( replace the semicolon with the plus signe ):
-exec grep -l 'pattern' {} \+
Will improve the performance significantly.
Then I can pipe only one xargs for the format printing needs only.

How to grep for files containing a specific word and pass the list of files as argument to second command?

grep rli "stringName" * | xargs <second_command> <list_of files>
will the above code work for the functionality mentioned?
I am a beginner to not sure how to use it.
You are just missing hyphen for options to grep. Following should work
grep -rli "stringName" * | xargs <second_command>
Considering above command cannot handle whitespace or weird characters in file names, more robust solution would be to use find
find . -type f -exec grep -qi "stringName" {} + -print0 | xargs -0 <second_command>
Or use -Z option with xargs -0
grep -rli "stringName" * -Z | xargs -0 <second_command>
Extending on jkshah's answer, which is already quite good.
find . -type f -exec grep -qi "regex" {} \; -exec "second_command" {} \;
This has the advantage of being more portable (-print0 and -0 are gnu extensions).
It executes the second command for each matching file in turn. If you want to execute with a list of all matching files at the end instead, change the last \; to +

UNIX find for finding file names not paired by

Is there a simple way to recursively find all files in a directory hierarchy, that do not have a matching file with a different extension?
For example the directory has a bunch of files ending in .dat
I want to find the .dat files that do not have an accompanying .out file.
I have a while loop that checks each entry, but that is slow for long lists...
I am using GNU find.
Perhaps something like this?
find . -name "*.dat" -print | sort > column1.txt
find . -name "*.out" -print | sort > column2.txt
diff column1.txt column2.txt
I haven't tested it, but I think it's probably close to what you're asking for.
find . -name '*.dat' -printf "[ -f %p ] || echo %p\n" | sed 's/\.dat/.out/' | sh
I had to add a bunch of bells and whistles to the 1st solution, but that was a good start, thanks...
find . -print | grep -Fi '.dat' | grep -vFi '.dat.' | sort | sed -e 's/.dat//g' > column1.txt
find . -print | grep -Fi '.out' | grep -vFi '.out.' | sort | sed -e 's/.out//g' > column2.txt
sdiff -s column1.txt column2.txt | grep -F '<' | cut -f1 -d"<" > c12diff.txt

Resources