find + sed, filename output

find + sed, filename output - unix

I have directory: D:/Temp, where there are a lot of subfolders with text files. Each folder has "file.txt". In some file.txt files is a word - "pattern". I would like check how many pattern words there are, and also get the filepath to that file.txt:
find D:/Temp -type f -name "file.txt" -exec basename {} cat {} \; | sed -n '/pattern/p' | wc -l
Output should be:
4
D:/Temp/abc1/file.txt
D:/Temp/abc2/file.txt
D:/Temp/abc3/file.txt
D:/Temp/abc4/file.txt
Or similar.

You could use GNU grep :
grep -lr --include file.txt "pattern" "D:/Temp/"
This will return the file paths.
grep -cr --include file.txt "pattern" "D:/Temp/"
This will return the count (counting the pattern occurences rather than the number of files)
Explanation of the flags :
-r makes grep recursively browse its target, that can then be a directory
--include <glob> makes grep restrict its recursive browsing to files matching the <glob>.
-l makes grep only return the files path. Additionnaly, it will stop parsing a file as soon as it has encountered the pattern.
-c makes grep only return the number of matches

If your file names don't contain spaces then all you need is:
awk '/pattern/{print FILENAME; cnt++; nextfile} END{print cnt+0}' $(find D:/Temp -type f -name "file.txt")
The above used GNU awk for nextfile.

I'd propose you to use two commands : one for find all the files:
find ./ -name "file.txt" -exec fgrep -l "-pattern" {} \;
Another for counting them:
find ./ -name "file.txt" -exec fgrep -l "-pattern" {} \; | wc -l

Previously I've used:
grep -Hc "pattern" $(find D:/temp -type f -name "file.txt")
This will only work if file.txt is found. Otherwise you could use the following which will account for when both files are found or not found:
searchFiles=$(find D:/temp -type f -name "file.txt"); [[ ! -z "$searchFiles" ]] && grep -Hc "pattern" $searchFiles
The output for this would look more like:
D:/Temp/abc1/file.txt 2
D:/Temp/abc2/file.txt 1
D:/Temp/abc3/file.txt 1
D:/Temp/abc4/file.txt 1

I would use
find D:/Temp -type f -name "file.txt" -exec dirname {} \; > tmpfile
wc -l tmpfile
cat tmpfile
rm tmpfile

Give a try to this safe and standard version:
find D:/Temp -type f -name file.txt -printf "%p\0" | xargs -0 bash -c 'printf "%s" "${#}"; grep -c "pattern" "${#}"' | grep ":[1-9][0-9]*$"
For each file.txt file found in D:/Temp directory and sub-directories, the xargs command prints the filename and the number of lines which contain pattern (grep -c).
A final grep ":[1-9][0-9]*$" selects only filenames with a count greater than 0.

The way I'm reading your question, I'm going to answer as if:
some but not all file.txt files contain pattern,
you want a list of the paths leading to file.txt with pattern, and
you want a count of pattern in each of those files.
There are a few options. (Always multiple ways to do anything.)
If your bash is version 4 or higher, you can use globstar to recurse through directories:
shopt -s globstar
for file in **/file.txt; do
if count=$(grep -c 'pattern' "$file"); then
printf "%d %s\n" "$count" "${file%/*}"
fi
done
This works because the if evaluation considers a failed grep (i.e. zero occurrences) to be FALSE, and thus does not print results.
Note that this may be high impact because it launches a separate grep on each file that is found. A lighter weight alternative might be to run a single grep on the fileglob, and parse the results:
shopt -s globstar
grep -c 'pattern' **/file.txt | grep -v ':0$'
This also depends on bash 4, and of course if you have millions of files you may overwhelm bash's command line maximum length. The output of this will be obvious, but you'll need to parse it with care if your filenames contain colons. I.e. cut -d: -f2 may not cut it.
One more option that leverages grep instead of bash might be:
grep -r --include 'file.txt' -c 'pattern' ./ | grep -v ':0$'
This uses GNU grep's --include option which modified the behaviour of -r (recursive). It should work in Linux, FreeBSD, NetBSD, OSX, but not with the default grep on OpenBSD or most SVR4 (Solaris, HP/UX, etc).
Note that I have tested none of these. No liability assumed. May contain nuts.

This should do it:
find . -name "file.txt" -type f -printf '%p\n' | awk '{print} END { print NR }'

Related

How to rename multiple files in several folders?

I'd like to rename all files in several folders with filename containing '*file*' by '*doc*'. I've tried
find . -name "*file*" -exec mv {} `echo {} | sed "s/file/doc/"` \;
but got an error (see below).
~$ ls
my_file_1.txt my_file_2.txt my_file_3.txt
~$ find . -name "*file*"
./my_file_1.txt
./my_file_3.txt
./my_file_2.txt
~$ echo my_file_1.txt | sed "s/file/doc/"
my_doc_1.txt
~$ find . -name "*file*" -exec echo {} \;
./my_file_1.txt
./my_file_3.txt
./my_file_2.txt
~$ find . -name "*file*" -exec mv {} `echo {} | sed "s/file/doc/"` \;
mv: './my_file_1.txt' and './my_file_1.txt' are the same file
mv: './my_file_3.txt' and './my_file_3.txt' are the same file
mv: './my_file_2.txt' and './my_file_2.txt' are the same file
Many thanks for your help!

There are a thousand ways to do it, I'd do it with Perl, something like this will work:
find files -type f -name "file*" | perl -ne 'chomp; $f=$_; $f=~s/\/file/\/doc/; `mv $_ $f`;'
-ne process as inline script for each line input
chomp clean a newline
$f is new filename, same as old filename
s/\/file/\/doc/ replace "/file" with "/doc" in the new filename
mv $_ $f rename the file by running an OS command with back ticks

The problem with your solution is that the echo {} | sed "s/file/doc/" is executed before the rest of the find command. I tried to make a command demonstrating this:
find . -name "." -exec date \; -exec echo `date; sleep 5` \;
When the date commands aare executed from left to right, the dates would be equal. However the second date and the sleep are executed before find starts the first date.
Result:
Wed Aug 25 22:33:43 XXX 2021
Wed Aug 25 22:33:38 XXX 2021
The following solution is using print0 and xargs -0 for filenames with newlines. xargs will echo the mv command with two additional slashes.
The slashes will be found by the sed command, changing the target filename.
The result of sed is parsed by a new bash shell.
find . -name "*file1*" -print0 2>/dev/null |
xargs -0 -I {} echo mv '"{}"' //'"{}"' |
sed -r 's#//(.*)file(.*)#\1doc\2#' |
bash

See if you have rename command. If it is perl based:
# -n is for testing, remove it for actual renaming
find -name '*file*' -exec rename -n 's/file/doc/' {} +
If it is not perl based, see if this works:
# remove --no-act --verbose for actual renaming
find -name '*file*' -exec rename --no-act --verbose 'file' 'doc' {} +

Removing Files with specific ending. Need something more specific

I'm trying to purge all thumbnails created by Wordpress because of a CMS switchover that I'm planning.
find -name \*-*x*.* | xargs rm -f
But I dont know bash or regex well enough to figure out how to add a bit more specifity such as only the following will be removed
All generated files have the syntax of
<img-name>-<width:integer>x<height:integer>.<file-ext> syntax

You didn't quote or escape all your wildcards, so the shell will try to expand them before find executes.
Quoting it should work
find -name '*-*x*.*'| xargs echo rm -f
Remove the echo when you're satisfied it works. You could also check that two of the fields are numbers by switching to -regex, but not sure if you need/want that here.
regex soultion
find -regex '^.*/[A-Za-z]+-[0-9]+x[0-9]+\.[A-Za-z]+$' | xargs echo rm -f
Note: I'm assuming img-name and file-ext can only contain letters

You can try this:
find -type f | grep -P '\w+-\d+x\d+\.\w+$' | xargs rm
If you have spaces in the path:
find -type f | grep -P '\w+-\d+x\d+\.\w+$' | sed -re 's/(\s)/\\\1/g' | xargs rm
Example:
find -type f | grep -P '\w+-\d+x\d+\.\w+$' | sed -re 's/(\s)/\\\1/g' | xargs ls -l
-rw-rw-r-- 1 tiago tiago 0 Jun 22 15:14 ./image-800x600.png
-rw-rw-r-- 1 tiago tiago 0 Jun 22 15:17 ./test 2/test 3/image-800x600.png

The below GNU find command will remove all the files which contain this <img-name>-<width:integer>x<height:integer>.<file-ext> syntax string. And also i assumed that the corresponding files has . in their file-names.
find . -name "*.*" -type f -exec grep -l '<img-name>-<width:integer>x<height:integer>.<file-ext> syntax' {} \; | xargs rm -f
Explanation:
. Directory in which find operation is going to takeplace.(. represnts your current directory)
-name "*.*" File must have dot in their file-names.
-type f Only files.
-exec grep -l '<img-name>-<width:integer>x<height:integer>.<file-ext> syntax' {} print the file names which contain the above mentioned pattern.
xargs rm -f For each founded files, the filename was fed into xargs and it got removed.

How to run a command on all results of find?

Using find I create a file that contains all the files that use a specific key word:
find . -type f | xargs grep -l 'foo' > foo.txt
I want to take that list in foo.txt and maybe run some commands using that list, i.e. run an ls command on the list contained within the file.

You don't need xargs to create foo.txt. Just execute the command with -exec like this:
find . -type f -exec grep -l 'foo' {} \; > foo.txt
Then you can run ls against the file by looping through the file:
while IFS= read -r read file
do
ls "$file"
done < foo.txt
Maybe it is a little ugly, but this can also make it:
ls $(cat foo.txt)

You can use xargs like this:
xargs ls < foo.txt
The advantage of xargs is that it will execute the command with multiple arguments which is more efficient than executing the command once per argument using a loop, for example.

How to move or copy files listed by 'find' command in unix?

I have a list of certain files that I see using the command below, but how can I copy those files listed into another folder, say ~/test?
find . -mtime 1 -exec du -hc {} +

Adding to Eric Jablow's answer, here is a possible solution (it worked for me - linux mint 14 /nadia)
find /path/to/search/ -type f -name "glob-to-find-files" | xargs cp -t /target/path/
You can refer to "How can I use xargs to copy files that have spaces and quotes in their names?" as well.

Actually, you can process the find command output in a copy command in two ways:
If the find command's output doesn't contain any space, i.e if the filename doesn't contain a space in it, then you can use:
Syntax:
find <Path> <Conditions> | xargs cp -t <copy file path>
Example:
find -mtime -1 -type f | xargs cp -t inner/
But our production data files might contain spaces, so most of time this command is effective:
Syntax:
find <path> <condition> -exec cp '{}' <copy path> \;
Example
find -mtime -1 -type f -exec cp '{}' inner/ \;
In the second example, the last part, the semi-colon is also considered as part of the find command, and should be escaped before pressing Enter. Otherwise you will get an error something like:
find: missing argument to `-exec'

find /PATH/TO/YOUR/FILES -name NAME.EXT -exec cp -rfp {} /DST_DIR \;

If you're using GNU find,
find . -mtime 1 -exec cp -t ~/test/ {} +
This works as well as piping the output into xargs while avoiding the pitfalls of doing so (it handles embedded spaces and newlines without having to use find ... -print0 | xargs -0 ...).

This is the best way for me:
cat filename.tsv |
while read FILENAME
do
sudo find /PATH_FROM/ -name "$FILENAME" -maxdepth 4 -exec cp '{}' /PATH_TO/ \; ;
done

How to limit grep to only search the files that you want

We have a rather large and complex file system and I am trying to generate a list of files containing a particular text string. This should be simple, but I need to exclude the './svn' and './pdv' directories (and probably others) and to only look at files of type *.p, *.w or .i.
I can easily do this with a program, but it is proving very slow to run. I want to speed up the process (so that I'm not searching thousands of files repeatedly) as I need to run such searches against a long list of criteria.
Normally, we search the file system using:
find . -name "*.[!r]*" -exec grep -i -l "search for me" {} \;
This is working, but I'm then having to use a program to exclude the unwanted directories , so it is running very slowly.
After looking at the topics here:
Stack Overflow thread
I've decided to try a few other aproaches:
grep -ilR "search for me" . --exclude ".svn" --excluse "pdv" --exclude "!.{p,w,i*}"
Excludes the './svn', but not the './pdv' directories, Doesn't limit the files looked at.
grep -ilR "search for me" . --exclude ".svn" --excluse "pdv" --include "*.p"
Excludes the './svn', but not the './pdv' directories, Doesn't limit the files looked at.
find . -name "*.[!r]*" -exec grep -i -l ".svn" | grep -i -l "search for me" {} \;
I can't even get this (or variations on it) to run successfully.
find . ! -name "*.svn*" -prune -print -exec grep -i -l "search for me" {} \;
Doesn't return anything. It looks like it stops as soon as it finds the .svn directory.

How about something like:
find . \( \( -name .svn -o -name pdv \) -type d -prune \) -o \( -name '*.[pwi]' -type f -exec grep -i -l "search for me" {} + \)
This will:
- ignore the contents of directories named .svn and pdv
- grep files (and symlinks to files) named *.[pwi]
The + option after exec means gather as many files into a single command as will fit on the command line (roughly 1 million chars in Linux). This can seriously speed up processing if you have to iterate over thousands of files.

Following command finds only *.rb files containing require 'bundler/setup' line and excludes search in .git and .bundle directories. That is the same use case I think.
grep -ril --exclude-dir .git --exclude-dir .bundle \
--include \*.rb "^require 'bundler/setup'$" .
The problem was with swapping of --exclude and --exclude-dir parameters I believe. Refer to the grep(1) manual.
Also note that exclude/include parameters accept GLOB only, not regexps, therefore single character suffix range can be done with one --include parameter, but more complex conditions would require more of the parameters:
--include \*.[pwi] --include \*.multichar_sfx ...

You can try the following:
find path_starting_point -type f | grep regex_to_filter_file_names | xargs grep regex_to_find_inside_matched_files

find . -name "filename_regex"|grep -v '.svn' -v '.pdv'|xargs grep -i 'your search string'

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex