Filenames and linenumbers for the matches of cat and grep - unix

My code
$ *.php | grep google
How can I print the filenames and linenumbers next to each match?

grep google *.php
if you want to span many directories:
find . -name \*.php -print0 | xargs -0 grep -n -H google
(as explained in comments, -H is useful if xargs comes up with only one remaining file)

You shouldn't be doing
$ *.php | grep
That means "run the first PHP program, with the name of the rest wildcarded as parameters, and then run grep on the output".
It should be:
$ grep -n -H "google" *.php
The -n flag tells it to do line numbers, and the -H flag tells it to display the filename even if there's only file. Grep will default to showing the filenames if there are multiple files being searched, but you probably want to make sure the output is consistent regardless of how many matching files there are.

grep -RH "google" *.php

Please take a look at ack at http://betterthangrep.com. The equivalent in ack of what you're trying is:
ack google --php

find ./*.php -exec grep -l 'google' {} \;

Use "man grep" to see other features.
for i in $(ls *.php); do grep -n --with-filename "google" $i; done;

find . -name "*.php" -print | xargs grep -n "searchstring"

Related

How to perform a static count of loaded packages in R?

I'd like to search a directory structure to count the number of times I've loaded various R packages. The source is contained in .org and .R files. I'm willing to assume that "library(" is the first non-blank entry on any line I care about, and I'm willing to assume that there is at most only one such call per line.
find . -regex ".*/.*\.org" -print
gets me a list of .org files, and
find . -regex ".*\.\(org\|R\)$" -print
gets me a list of .org and .R files (thanks to https://unix.stackexchange.com/questions/15308/how-to-use-find-command-to-search-for-multiple-extensions).
Given a particular file,
grep -h "library(" file | sed 's/library(//' | sed 's/)//'
gets me the package name. I'd like to hook them together and then possibly redirect the output to a file, from which I can use R to calculate frequencies.
The seemingly straightforward
find . -regex ".*/.*\.org" -print | xargs -0 grep -h "library(" | sed 's/library(//' | sed 's/)//'
doesn't work; I get
find . -regex ".*/.*\.org" -print | xargs -0 grep -h "library(" | sed 's/library(//' | sed 's/)//'
Usage: /usr/bin/grep [OPTION]... PATTERN [FILE]...
Try '/usr/bin/grep --help' for more information.
and I'm not sure what to do next.
I also tried
find . -regex ".*/.*\.org" -exec grep -h "library(" "{}" "\;"
and got
find . -regex ".*/.*\.org" -exec grep -h "library(" "{}" "\;"
find: missing argument to `-exec'
It seems simple. What am I missing?
UPDATE: Adding -t to the above xargs shows me the first command:
grep -h library ./dirname/filename.org
followed by, presumably, a list of all the matching files with paths relative to the PWD. Actually, that works if I only search for .org files; if I add .R files, too, I get "xargs: argument line too long". I think that means xargs is passing the entire list of files as the argument to one invocation of grep.
find ... -print | xargs OK
find ... -print0 | xargs -0 OK
find ... -print0 | xargs broken
find ... -print | xargs -0 broken (what you used)
Also, please don't:
grep -h "library(" | sed 's/library(//' | sed 's/)//'
when this is faster:
grep -h "library(" | sed -e 's/library(//' -e 's/)//'
and this is even faster, and more interesting:
grep -h "library(" | grep -o '(.*)' | tr -d ' ()'

find + sed, filename output

I have directory: D:/Temp, where there are a lot of subfolders with text files. Each folder has "file.txt". In some file.txt files is a word - "pattern". I would like check how many pattern words there are, and also get the filepath to that file.txt:
find D:/Temp -type f -name "file.txt" -exec basename {} cat {} \; | sed -n '/pattern/p' | wc -l
Output should be:
4
D:/Temp/abc1/file.txt
D:/Temp/abc2/file.txt
D:/Temp/abc3/file.txt
D:/Temp/abc4/file.txt
Or similar.
You could use GNU grep :
grep -lr --include file.txt "pattern" "D:/Temp/"
This will return the file paths.
grep -cr --include file.txt "pattern" "D:/Temp/"
This will return the count (counting the pattern occurences rather than the number of files)
Explanation of the flags :
-r makes grep recursively browse its target, that can then be a directory
--include <glob> makes grep restrict its recursive browsing to files matching the <glob>.
-l makes grep only return the files path. Additionnaly, it will stop parsing a file as soon as it has encountered the pattern.
-c makes grep only return the number of matches
If your file names don't contain spaces then all you need is:
awk '/pattern/{print FILENAME; cnt++; nextfile} END{print cnt+0}' $(find D:/Temp -type f -name "file.txt")
The above used GNU awk for nextfile.
I'd propose you to use two commands : one for find all the files:
find ./ -name "file.txt" -exec fgrep -l "-pattern" {} \;
Another for counting them:
find ./ -name "file.txt" -exec fgrep -l "-pattern" {} \; | wc -l
Previously I've used:
grep -Hc "pattern" $(find D:/temp -type f -name "file.txt")
This will only work if file.txt is found. Otherwise you could use the following which will account for when both files are found or not found:
searchFiles=$(find D:/temp -type f -name "file.txt"); [[ ! -z "$searchFiles" ]] && grep -Hc "pattern" $searchFiles
The output for this would look more like:
D:/Temp/abc1/file.txt 2
D:/Temp/abc2/file.txt 1
D:/Temp/abc3/file.txt 1
D:/Temp/abc4/file.txt 1
I would use
find D:/Temp -type f -name "file.txt" -exec dirname {} \; > tmpfile
wc -l tmpfile
cat tmpfile
rm tmpfile
Give a try to this safe and standard version:
find D:/Temp -type f -name file.txt -printf "%p\0" | xargs -0 bash -c 'printf "%s" "${#}"; grep -c "pattern" "${#}"' | grep ":[1-9][0-9]*$"
For each file.txt file found in D:/Temp directory and sub-directories, the xargs command prints the filename and the number of lines which contain pattern (grep -c).
A final grep ":[1-9][0-9]*$" selects only filenames with a count greater than 0.
The way I'm reading your question, I'm going to answer as if:
some but not all file.txt files contain pattern,
you want a list of the paths leading to file.txt with pattern, and
you want a count of pattern in each of those files.
There are a few options. (Always multiple ways to do anything.)
If your bash is version 4 or higher, you can use globstar to recurse through directories:
shopt -s globstar
for file in **/file.txt; do
if count=$(grep -c 'pattern' "$file"); then
printf "%d %s\n" "$count" "${file%/*}"
fi
done
This works because the if evaluation considers a failed grep (i.e. zero occurrences) to be FALSE, and thus does not print results.
Note that this may be high impact because it launches a separate grep on each file that is found. A lighter weight alternative might be to run a single grep on the fileglob, and parse the results:
shopt -s globstar
grep -c 'pattern' **/file.txt | grep -v ':0$'
This also depends on bash 4, and of course if you have millions of files you may overwhelm bash's command line maximum length. The output of this will be obvious, but you'll need to parse it with care if your filenames contain colons. I.e. cut -d: -f2 may not cut it.
One more option that leverages grep instead of bash might be:
grep -r --include 'file.txt' -c 'pattern' ./ | grep -v ':0$'
This uses GNU grep's --include option which modified the behaviour of -r (recursive). It should work in Linux, FreeBSD, NetBSD, OSX, but not with the default grep on OpenBSD or most SVR4 (Solaris, HP/UX, etc).
Note that I have tested none of these. No liability assumed. May contain nuts.
This should do it:
find . -name "file.txt" -type f -printf '%p\n' | awk '{print} END { print NR }'

bzgrep not printing the file name

find . -name '{fileNamePattern}*.bz2' | xargs -n 1 -P 3 bzgrep -H "{patternToSearch}"
I am using the command above to find out a .bz2 file from set of files that have a pattern that I am looking for. It does go through the files because I can see the pattern that I am trying to find being printed on the console but I don't see the file name.
If you look at the bzgrep script (for example this version for OS X) you will see that it pipes the output from bzip2 through grep. That process loses the original filenames. grep never sees them so it cannot print them out (despite your -H flag).
Something like this should do, not exactly what you want but something similar. (You could get the prefix you were expecting by piping the output from bzgrep into sed/awk but that's a bit less simple of a command to write out.)
find . -name '{fileNamePattern}*.bz2' -printf '### %p\n' -exec bzgrep "{patternToSearch}" {} \;
I printed the file name through echo command and xargs.
find . -name "*bz2" | parallel -j 128 echo -n {}\" \" | xargs bzgrep {pattern}
Etan is very close with his answer: grep indeed does not show the filename when only dealing with one file, so you can make grep believe he's looking into multiple files, just by adding the NULL file, so the command becomes:
find . -name '{fileNamePattern}*.bz2' -printf '### %p\n'
-exec bzgrep "{patternToSearch}" {} /dev/null \;
(It's a dirty trick but it's helping me already for more than 15 years :-) )

Solaris unix, c-shell, redirecting xargs executed command output

Have no choice about the c-shell. It's what we use here.
So I want to parse through current directory and all sub-directories looking for files of form *.utv and egrep each to look for a specific account number in the file.
I tried something like this:
egrep -l "ACCOUNT NO: +700 " `find . -name "*.utv" ` | more
but got "Too many words from `` " message.
So using xargs because apparently I'm getting too many file names passed back to egrep command-line.
When I do this:
find . -name "*.utv" | xargs -n1 egrep -i -l '"ACCOUNT NO: +700 "' {} >&! /home/me/output.txt
"ps -ef" command shows:
% ps -ef | egrep -i "myuserid"
myuserid 20791 22549 0 18:19:38 pts/20 0:00 find . -name *.utv
myuserid 20792 22549 0 18:19:38 pts/20 0:00 xargs -n1 egrep -i -l "ACCOUNT NO: +700 "
myuserid 22774 20792 1 18:21:13 pts/20 0:04 egrep -i -l "ACCOUNT NO: +700 " ./01/130104_reportfile.utv
%
But I get no output in the "output.txt" file.
If I run the egrep part by hand in the same directory, I get a list of file names containing the account 700 string.
I'm sure it's just a matter of grouping, quoting proper, and/or having the redirect in the right place, but after quite a lot of trial-and-error (and searching here) I'm still not getting anywhere.
Any suggestions?
You only need either single quotes or double quotes (but not both) around the search, as in your original command:
find . -name "*.utv" | xargs -n1 egrep -i -l "ACCOUNT NO: +700 " {} >&! /home/me/output.txt
find . -name "*.utv" | xargs -n1 egrep -i -l 'ACCOUNT NO: +700 ' {} >&! /home/me/output.txt
I'd also lose the -n1, the -i and the {} from the command line too. A trick to always get file names listed is to specify /dev/null as a name, but the -l also does the job:
find . -name "*.utv" | xargs egrep -l 'ACCOUNT NO: +700 ' >&! /home/me/output.txt
And you need to enlighten the powers that be that C shell is not good for programming. And you can always add exec /bin/bash -l to your .login script (or use /bin/ksh instead of /bin/bash). I simply wouldn't have any truck with "You cannot use a sane, civilized shell" rules.

Multiple grep search/ignore patterns

I usually use the following pipeline to grep for a particular search string and yet ignore certain other patterns:
grep -Ri 64 src/install/ | grep -v \.svn | grep -v "file"| grep -v "2\.5" | grep -v "2\.6"
Can this be achieved in a succinct manner? I am using GNU grep 2.5.3.
Just pipe your unfiltered output into a single instance of grep and use an extended regexp to declare what you want to ignore:
grep -Ri 64 src/install/ | grep -v -E '(\.svn|file|2\.5|2\.6)'
Edit: To search multiple files maybe try
find ./src/install -type f -print |\
grep -v -E '(\.svn|file|2\.5|2\.6)' | xargs grep -i 64
Edit: Ooh. I forgot to add the simple trick to stop a cringeable use of multiple grep instances, namely
ps -ef | grep something | grep -v grep
Replacing that with
ps -ef | grep "[s]omething"
removes the need of the second grep.
Use the -e option to specify multiple patterns:
grep -Ri 64 src/install/ | grep -v -e '\.svn' -e file -e '2\.5' -e '2\.6'
You might also be interested in the -F flag, which indicates that patterns are fixed strings instead of regular expressions. Now you don't have to escape the dot:
grep -Ri 64 src/install/ | grep -vF -e .svn -e file -e 2.5 -e 2.6
I noticed you were grepping out ".svn". You probably want to skip any directories named ".svn" in your initial recursive grep. If I were you, I would do this instead:
grep -Ri 64 src/install/ --exclude-dir .svn | grep -vF -e file -e 2.5 -e 2.6
you can use awk instead of grep
awk '/64/&&!/(\.svn|file|2\.[56])/' file
You maybe want to use ack-grep which allow to exclude with perl regexp as well and avoid all the VC directories, great for grepping source code.
The following script will remove all files except a list of files:
echo cleanup_all $#
if [[ $# -eq 0 ]]; then
FILES=`find . -type f`
else
EXCLUDE_FILES_EXP="("
for EXCLUDED_FILE in $#
do
EXCLUDE_FILES_EXP="$EXCLUDE_FILES_EXP./$EXCLUDED_FILE|"
done
# strip last char
EXCLUDE_FILES_EXP="${EXCLUDE_FILES_EXP%?}"
EXCLUDE_FILES_EXP="$EXCLUDE_FILES_EXP)"
echo exluded files expression : $EXCLUDE_FILES_EXP
FILES=`find . -type f | egrep -v $EXCLUDE_FILES_EXP`
fi
echo removing $FILES
for FILE in $FILES
do
echo "cleanup: removing file $FILE"
rm $FILE
done

Resources