How to find (and rename) file and folder names with octl characters - unix

I accidentally copied files with the wrong encoding, so instead of utf-8 the file and folder names appear to be encoded with octl, i.e. they are for example called L\334ten.txt instead of Löten.txt. I would like to (at least) find all affected files and folders, ideally I would be able to rename the files automatically (so \334 to ö and so on). If changing the encoding is an option, that's of course okay, too. A Bash solution would be best, but I am open to using python or something similar.
I tried identifying the files/folders using grep/find, but sadly without any luck.

The quick and dirty oneliner:
for file in $(find . -regextype posix-extended -regex ".*[\][0-9]{3}.*"); \
do \
OLD_NAME=$(basename $file); \
NEW_NAME=$(echo $OLD_NAME | \
sed 's/\\337/ß/g' | \
sed 's/\\344/ä/g' | \
sed 's/\\366/ö/g' | \
sed 's/\\374/ü/g'); \
mv $file $(dirname $file)/$NEW_NAME; \
done
Proof:
$ touch 'W\344rme.txt' 'L\366ten.txt' 'l\366tf\344hige.txt'
$ ls
'L\366ten.txt' 'l\366tf\344hige.txt' 'W\344rme.txt'
$ copy_paste_oneliner_here
$ ls
Löten.txt lötfähige.txt Wärme.txt
UPDATE:
#rt87 If I understood right your comment, it's possible to emulate your weird filenames:
$ touch $(echo "Löten.txt" | iconv -f UTF-8 -t ISO-8859-1)
So, now we have a file with incorrect encoded name for UTF-8 locale - L�ten.txt. In the terminal you can see:
$ ls
'L'$'\366''ten.txt'
Thus, you can get back your files with another oneliner:
for file in *.*; do mv "$file" "$(echo $file | iconv -f ISO-8859-1 -t UTF-8)"; done
In our test examle we got:
$ ls
Löten.txt

Related

Unix case insensitive command line search containing wldcards and spaces

I am attempting to come up with a method to remotely find a list of files on our AIX UNIX machine that meet, what seems in windows, like simple criteria. It needs to be case insensitive (sigh), use wildcards (*) and possibly contain spaces in the path.
For my tests below I was using the ksh shell. However it will need to work in an ssh shell as well.
I am attempting to implement secure FTP in Visual Basic 6 (I know) using plink, command line and a batch file.
Basically find a file like the one below but with case insensitivity:
ls -1 -d -p "/test/rick/01012017fosterYYY - Copy.txt" | grep -v '.*/$'
Thanks for any help.
ls -1 -d -p /test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy] - [Cc][Oo][Pp][Yy].[Tt][Xx][Tt] | grep -v '.*\/$'**
fails with:
ls: 0653-341 The file /test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy] do
es not exist.
ls: 0653-341 The file - does not exist.
ls: 0653-341 The file [Cc][Oo][Pp][Yy].[Tt][Xx][Tt] does not exist.
ls -1 -d -p /test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy].[Tt][Xx][Tt] | grep -v '.*\/$'**
success - as long as there are no spaces.
ls -1 -d -p "/test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy].[Tt][Xx][Tt]" | grep -v '.*\/$'**
fails with:
ls: 0653-341 The file /test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy].[T
t][Xx][Tt] does not exist.
-- Assumption: We cannot use quotes with wildcard characters
ls -1 -d -p "/test/rick/01012017fosterYYY - Copy.txt" | grep -v '.*\/$'**
success. not case insensitive.
ls -1 -d -p /test/rick/[0][1][0][1][2][0][1][7][Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy] - [Cc][Oo][Pp][Yy].[Tt][Xx][Tt] | grep -v '.*\/$'**
fails with:
ls: 0653-341 The file /test/rick/[0][1][0][1][2][0][1][7][Ff][Oo][Ss][Tt][Ee][Rr
][Yy][Yy][Yy][ does not exist.
ls: 0653-341 The file ][-][ does not exist.
ls: 0653-341 The file ][Cc][Oo][Pp][Yy].[Tt][Xx][Tt] does not exist.
ls -1 -d -p /test/rick/[0][1][0][1][2][0][1][7][Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy][ ][-][ ][Cc][Oo][Pp][Yy].[Tt][Xx][Tt] | grep -v '.*\/$'**
fails with:
ls: 0653-341 The file /test/rick/[0][1][0][1][2][0][1][7][Ff][Oo][Ss][Tt][Ee][Rr
][Yy][Yy][Yy][ does not exist.
ls: 0653-341 The file ][-][ does not exist.
ls: 0653-341 The file ][Cc][Oo][Pp][Yy].[Tt][Xx][Tt] does not exist.
ls -1 -d -p /test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy]?-?[Cc][Oo][Pp][Yy].[Tt][Xx][Tt] | grep -v '.*\/$'**
success. not very helpful though.
ls -1 -d -p /test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy][ ]-[ ][Cc][Oo][Pp][Yy].[Tt][Xx][Tt] | grep -v '.*\/$'**
fails with:
ls: 0653-341 The file /test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy][ d
oes not exist.
ls: 0653-341 The file ]-[ does not exist.
ls: 0653-341 The file ][Cc][Oo][Pp][Yy].[Tt][Xx][Tt] does not exist.
ls -1 -d -p /test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy]{ }-{ }[Cc][Oo][Pp][Yy].[Tt][Xx][Tt] | grep -v '.*\/$'**
fails with:
ls: 0653-341 The file /test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy]{ d
oes not exist.
ls: 0653-341 The file }-{ does not exist.
ls: 0653-341 The file }[Cc][Oo][Pp][Yy].[Tt][Xx][Tt] does not exist.
ls -1 -d -p /test/rick/*01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy] - [Cc][Oo][Pp][Yy].[Tt][Xx][Tt]* | grep -v '.*\/$'**
fails with
ls: 0653-341 The file /test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy] d
oes not exist.
ls: 0653-341 The file - does not exist.
ls: 0653-341 The file [Cc][Oo][Pp][Yy].[Tt][Xx][Tt] does not exist.
ls -1 -d -p "/test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy] - [Cc][Oo][Pp][Yy].[Tt][Xx][Tt]" | grep -v '.*\/$'**
fails with:
ls: 0653-341 The file /test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy] -
[Cc][Oo][Pp][Yy].[Tt][Xx][Tt] does not exist.
ls doesn't do pattern matching, any wildcard expansion (globbing) is done by the shell. The glob pattern language is different from regular expressions. Read the ksh documentation for information about globbing ("File Name Generation" in the manpage).
So when you do:
$ touch foo flo fum
$ ls -1 f[ol]o
flo
foo
... the shell notices the globbing characters [], reads the directory contents, replaces it with the matching files, and passes those as parameters to ls. You can show this by using echo instead:
$ echo f[ol]o
flo foo
ksh has globbing options available with the ~() construct, option i is "Treat the match as case insensitive" :
ksh$ touch foo FoO FOO
ksh$ echo ~(i)foo
foo FoO FOO
bash has a nocaseglob shopt option:
bash$ shopt -s nocaseglob
bash$ touch fOo
bash$ echo FO*
foo
Although note that some globbing character needs to be present to make the magic happen:
bash$ echo FOO
FOO
bash$ echo [F]OO
foo
(to keep this option change local, see https://unix.stackexchange.com/questions/310957/how-to-undo-a-set-x/310963)
It looks as if you're using grep -v '.*/$' to remove lines that are directories. The .* is superfluous here -- grep -v '/$' is equivalent.
But find is a better tool for this kind of searching and filtering, implementing -type f (match regular files) by actually looking at the file attributes, rather than by parsing a bit of ASCII in a listing.
$ touch foo FOO FoO
$ mkdir fOo
$ find . -maxdepth 1 -type f -iname "foo"
./FOO
./foo
./FoO
You could use find's -iname option to allow for case-insensitive searching, so for the example you've provided any of the following should find your file:
find /test/rick -maxdepth 1 -iname '01012017fosterYYY - copy.txt'
# or
find /test/rick -maxdepth 1 -iname '01012017fosteryyy - copy.txt'
# or
find /test/rick -maxdepth 1 -iname '01012017FOSTERyyy - cOpY.txt'
-maxdepth 1 : don't search in sub-directories
-iname : allow for case-insensitive searching
For case insensitive wildcard searches when -maxdepth and -iname flags are not available for AIX Find , you can pass the Find results to Grep:
find /test/rick/. \( ! -name . -prune \) -type f -print | grep -i ".*foster.*\.txt"
find [InThisFolder] [ExcludeSubfolders] [FileTypes] | grep [InsensitiveWildcardName]
Though, this can still be problematic if you have a folder structure like "/test/rick/rick/".
The following code gives results with the current directory signifier ".":
find /test/rick/. \( ! -name . -prune \) -type f -print | grep -i ".*foster.*\.txt"
But you can pass the results to sed and find "/./" and replace with "/".
find /test/rick/. \( ! -name . -prune \) -type f -print | grep -i ".*foster.*\.txt" | sed 's/\/\.\//\//g'
* UPDATE *
Based on this page: http://mywiki.wooledge.org/ParsingLs
I’ve come up with the following command (for loop on file expansion or globbing) which avoids the problematic "/test/rick/rick/" folder structure from the find | grep solution above. It searches a folder from any folder, handles spaces, and handles case insensitivity without having to specify escape characters or upper/lower matching ([Aa]).
Just modify the searchfolder and searchpattern:
searchfolder="/test/rick"; searchpattern="*foster*.txt"; for file in "$searchfolder"/*.*; do [[ -e "$file" ]] || continue; if [[ "$(basename "$file" | tr '[:upper:]' '[:lower:]')" = $searchpattern ]]; then echo "$file"; fi; done
It does this:
Set the folder path to search (searchfolder="/test/rick";)
Set the search pattern (searchpattern="*foster*.txt")
Loop for every file on the search folder (for file in "$searchfolder"/*.*;)
Make sure the file exists ( [[ -e "$file" ]] || continue;)
Transform any base file name uppercase characters to lowercase (basename "$file" | tr '[:upper:]' '[:lower:]')
Test if the lowered base file name matches the search pattern and if so
then print the full path and filename (if [[ $(basename "$file" | tr
'[:upper:]' '[:lower:]') = $searchpattern ]]; then echo "$file"; fi;)
Tested on AIX (Version 6.1.0.0) in ksh (Version M-11/16/88f) and ksh93 (Version M-12/28/93e).
What I finally used (because I don't have access to -maxdepth or -iname) was just to use case insensitive wildcards together with quotes around spaces.
ls -1 -d -p /test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy]' '-' '[Cc][Oo][Pp][Yy].[Tt][Xx][Tt] | grep -v '.*\/$'
That way I don't have to install or upgrade anything and probably cause more problems just so I can get a simple list of files.
NOTE: AIX UNIX will still throw in some garbage errors if you have any sub directories under the path. I tapped out on this and just parsed these useless messages out on the client side.
Thanks everyone who responded.

sed edit file in place

I am trying to find out if it is possible to edit a file in a single sed command without manually streaming the edited content into a new file and then renaming the new file to the original file name.
I tried the -i option but my Solaris system said that -i is an illegal option. Is there a different way?
The -i option streams the edited content into a new file and then renames it behind the scenes, anyway.
Example:
sed -i 's/STRING_TO_REPLACE/STRING_TO_REPLACE_IT/g' filename
while on macOS you need:
sed -i '' 's/STRING_TO_REPLACE/STRING_TO_REPLACE_IT/g' filename
On a system where sed does not have the ability to edit files in place, I think the better solution would be to use perl:
perl -pi -e 's/foo/bar/g' file.txt
Although this does create a temporary file, it replaces the original because an empty in place suffix/extension has been supplied.
Note that on OS X you might get strange errors like "invalid command code" or other strange errors when running this command. To fix this issue try
sed -i '' -e "s/STRING_TO_REPLACE/STRING_TO_REPLACE_IT/g" <file>
This is because on the OSX version of sed, the -i option expects an extension argument so your command is actually parsed as the extension argument and the file path is interpreted as the command code. Source: https://stackoverflow.com/a/19457213
The following works fine on my mac
sed -i.bak 's/foo/bar/g' sample
We are replacing foo with bar in sample file. Backup of original file will be saved in sample.bak
For editing inline without backup, use the following command
sed -i'' 's/foo/bar/g' sample
One thing to note, sed cannot write files on its own as the sole purpose of sed is to act as an editor on the "stream" (ie pipelines of stdin, stdout, stderr, and other >&n buffers, sockets and the like). With this in mind you can use another command tee to write the output back to the file. Another option is to create a patch from piping the content into diff.
Tee method
sed '/regex/' <file> | tee <file>
Patch method
sed '/regex/' <file> | diff -p <file> /dev/stdin | patch
UPDATE:
Also, note that patch will get the file to change from line 1 of the diff output:
Patch does not need to know which file to access as this is found in the first line of the output from diff:
$ echo foobar | tee fubar
$ sed 's/oo/u/' fubar | diff -p fubar /dev/stdin
*** fubar 2014-03-15 18:06:09.000000000 -0500
--- /dev/stdin 2014-03-15 18:06:41.000000000 -0500
***************
*** 1 ****
! foobar
--- 1 ----
! fubar
$ sed 's/oo/u/' fubar | diff -p fubar /dev/stdin | patch
patching file fubar
Versions of sed that support the -i option for editing a file in place write to a temporary file and then rename the file.
Alternatively, you can just use ed. For example, to change all occurrences of foo to bar in the file file.txt, you can do:
echo ',s/foo/bar/g; w' | tr \; '\012' | ed -s file.txt
Syntax is similar to sed, but certainly not exactly the same.
Even if you don't have a -i supporting sed, you can easily write a script to do the work for you. Instead of sed -i 's/foo/bar/g' file, you could do inline file sed 's/foo/bar/g'. Such a script is trivial to write. For example:
#!/bin/sh
IN=$1
shift
trap 'rm -f "$tmp"' 0
tmp=$( mktemp )
<"$IN" "$#" >"$tmp" && cat "$tmp" > "$IN" # preserve hard links
should be adequate for most uses.
You could use vi
vi -c '%s/foo/bar/g' my.txt -c 'wq'
sed supports in-place editing. From man sed:
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if extension supplied)
Example:
Let's say you have a file hello.txtwith the text:
hello world!
If you want to keep a backup of the old file, use:
sed -i.bak 's/hello/bonjour' hello.txt
You will end up with two files: hello.txt with the content:
bonjour world!
and hello.txt.bak with the old content.
If you don't want to keep a copy, just don't pass the extension parameter.
If you are replacing the same amount of characters and after carefully reading “In-place” editing of files...
You can also use the redirection operator <> to open the file to read and write:
sed 's/foo/bar/g' file 1<> file
See it live:
$ cat file
hello
i am here # see "here"
$ sed 's/here/away/' file 1<> file # Run the `sed` command
$ cat file
hello
i am away # this line is changed now
From Bash Reference Manual → 3.6.10 Opening File Descriptors for Reading and Writing:
The redirection operator
[n]<>word
causes the file whose name is the expansion of word to be opened for
both reading and writing on file descriptor n, or on file descriptor 0
if n is not specified. If the file does not exist, it is created.
Like Moneypenny said in Skyfall: "Sometimes the old ways are best."
Kincade said something similar later on.
$ printf ',s/false/true/g\nw\n' | ed {YourFileHere}
Happy editing in place.
Added '\nw\n' to write the file. Apologies for delay answering request.
You didn't specify what shell you are using, but with zsh you could use the =( ) construct to achieve this. Something along the lines of:
cp =(sed ... file; sync) file
=( ) is similar to >( ) but creates a temporary file which is automatically deleted when cp terminates.
mv file.txt file.tmp && sed 's/foo/bar/g' < file.tmp > file.txt
Should preserve all hardlinks, since output is directed back to overwrite the contents of the original file, and avoids any need for a special version of sed.
To resolve this issue on Mac I had to add some unix functions to core-utils following this.
brew install grep
==> Caveats
All commands have been installed with the prefix "g".
If you need to use these commands with their normal names, you
can add a "gnubin" directory to your PATH from your bashrc like:
PATH="/usr/local/opt/grep/libexec/gnubin:$PATH"
Call with gsed instead of sed. The mac default doesn't like how grep -rl displays file names with the ./ preprended.
~/my-dir/configs$ grep -rl Promise . | xargs sed -i 's/Promise/Bluebird/g'
sed: 1: "./test_config.js": invalid command code .
I also had to use xargs -I{} sed -i 's/Promise/Bluebird/g' {} for files with a space in the name.
Very good examples. I had the challenge to edit in place many files and the -i option seems to be the only reasonable solution using it within the find command. Here the script to add "version:" in front of the first line of each file:
find . -name pkg.json -print -exec sed -i '.bak' '1 s/^/version /' {} \;
In case you want to replace stings contain '/',you can use '?'. i.e. replace '/usr/local/bin/python' with '/usr/bin/python3' for all *.py files.
find . -name \*.py -exec sed -i 's?/usr/local/bin/python?/usr/bin/python3?g' {} \;

Performing grep operation in tar files without extracting

I have list of files which contain particular patterns, but those files have been tarred. Now I want to search for the pattern in the tar file, and to know which files contain the pattern without extracting the files.
Any idea...?
the tar command has a -O switch to extract your files to standard output. So you can pipe those output to grep/awk
tar xvf test.tar -O | awk '/pattern/{print}'
tar xvf test.tar -O | grep "pattern"
eg to return file name one pattern found
tar tf myarchive.tar | while read -r FILE
do
if tar xf test.tar $FILE -O | grep "pattern" ;then
echo "found pattern in : $FILE"
fi
done
The command zgrep should do exactly what you want, directly.
for example
zgrep "mypattern" *.gz
http://linux.about.com/library/cmd/blcmdl1_zgrep.htm
GNU tar has --to-command. With it you can have tar pipe each file from the archive into the given command. For the case where you just want the lines that match, that command can be a simple grep. To know the filenames you need to take advantage of tar setting certain variables in the command's environment; for example,
tar xaf thing.tar.xz --to-command="awk -e '/thing.to.match/ {print ENVIRON[\"TAR_FILENAME\"] \":\", \$0}'"
Because I find myself using this often, I have this:
#!/bin/sh
set -eu
if [ $# -lt 2 ]; then
echo "Usage: $(basename "$0") <pattern> <tarfile>"
exit 1
fi
if [ -t 1 ]; then
h="$(tput setf 4)"
m="$(tput setf 5)"
f="$(tput sgr0)"
else
h=""
m=""
f=""
fi
tar xaf "$2" --to-command="awk -e '/$1/{gsub(\"$1\", \"$m&$f\"); print \"$h\" ENVIRON[\"TAR_FILENAME\"] \"$f:\", \$0}'"
This can be done with tar --to-command and grep --label:
tar xaf archive.tar.gz --to-command 'egrep -Hn --label="$TAR_FILENAME" your_pattern_here || true'
--label gives grep the filename
-H tells grep to display the filename, and -n the line number
|| true because otherwise grep will exit with an error if the pattern is not found, and tar will complain about that.
xaf means to extract, and automagically decompress based off the file extension
--to-command has tar pass each file in the tarfile to a separate invocation of grep, and sets various environment variables with info about the file. See the manpage for more info.
Pretty heavily based off of Chipaca's answer (and Daniel H's comment), but this should be a bit easier to use and just uses tar and grep.
Python's tarfile module along with Tarfile.extractfile() will allow you to inspect the tarball's contents without extracting it to disk.
The easiest way is probably to use avfs. I've used this before for such tasks.
Basically, the syntax is:
avfsd ~/.avfs # Sets up a avfs virtual filesystem
rgrep pattern ~/.avfs/path/to/file.tar#/
/path/to/file.tar is the path to the actual tar file.
Pre-pending ~/.avfs/ (the mount point) and appending # lets avfs expose the tar file as a directory.
That's actually very easy with ugrep option -z:
-z, --decompress
Decompress files to search, when compressed. Archives (.cpio,
.pax, .tar, and .zip) and compressed archives (e.g. .taz, .tgz,
.tpz, .tbz, .tbz2, .tb2, .tz2, .tlz, and .txz) are searched and
matching pathnames of files in archives are output in braces. If
-g, -O, -M, or -t is specified, searches files within archives
whose name matches globs, matches file name extensions, matches
file signature magic bytes, or matches file types, respectively.
Supported compression formats: gzip (.gz), compress (.Z), zip,
bzip2 (requires suffix .bz, .bz2, .bzip2, .tbz, .tbz2, .tb2, .tz2),
lzma and xz (requires suffix .lzma, .tlz, .xz, .txz).
For example:
ugrep -z PATTERN archive.tgz
This greps each of the archived files to display PATTERN matches with the archived filenames. Archived filenames are shown in braces to distinguish them from ordinary filenames. Everything else is the same as grep (ugrep has the same options and produces the same output). For example:
$ ugrep -z "Hello" archive.tgz
{Hello.bat}:echo "Hello World!"
Binary file archive.tgz{Hello.class} matches
{Hello.java}:public class Hello // prints a Hello World! greeting
{Hello.java}: { System.out.println("Hello World!");
{Hello.pdf}:(Hello)
{Hello.sh}:echo "Hello World!"
{Hello.txt}:Hello
If you just want the file names, use option -l (--files-with-matches) and customize the filename output with option --format="%z%~" to get rid of the braces:
$ ugrep -z Hello -l --format="%z%~" archive.tgz
Hello.bat
Hello.class
Hello.java
Hello.pdf
Hello.sh
Hello.txt
Tarballs (.tar.gz/.tgz, .tar.bz2/.tbz, .tar.xz/.txz, .tar.lzma/.tlz) are searched as well as .zip archives.
You can mount the TAR archive with ratarmount and then simply search for the pattern in the mounted view:
pip install --user ratarmount
ratarmount large-archive.tar mountpoint
grep -r '<pattern>' mountpoint/
This should be much faster than iterating over each file and printing it to stdout, especially for compressed TARs.
Here is a simple comparison benchmark:
function checkFilesWithRatarmount()
{
local pattern=$1
local archive=$2
ratarmount "$archive" "$archive.mountpoint"
'grep' -r -l "$pattern" "$archive.mountpoint/"
}
function checkEachFileViaStdOut()
{
local pattern=$1
local archive=$2
tar --list --file "$archive" | while read -r file; do
if tar -x --file "$archive" -O -- "$file" | grep -q "$pattern"; then
echo "Found pattern in: $file"
fi
done
}
function createSampleTar()
{
for i in $( seq 40 ); do
head -c $(( 1024 * 1024 )) /dev/urandom | base64 > $i.dat
done
tar -czf "$1" [0-9]*.dat
}
createSampleTar myarchive.tar.gz
time checkEachFileViaStdOut ABCD myarchive.tar.gz
time checkFilesWithRatarmount ABCD myarchive.tar.gz
sleep 0.5s
fusermount -u myarchive.tar.gz.mountpoint
Results in seconds for a 55 MiB uncompressed and 42 MiB compressed TAR archive containing 40 files:
Compression
Ratarmount
Bash Loop over tar -O
none
0.31 +- 0.01
0.55 +- 0.02
gzip
1.1 +- 0.1
13.5 +- 0.1
bzip2
1.2 +- 0.1
97.8 +- 0.2
Of course, these results are highly dependent on the archive size and how many files the archive contains. These test examples are pretty small because I didn't want to wait too long but they already show the problem. The more files there are, the longer it takes for tar -O to jump to the correct file. And for compressed archives, it will be quadratically slower the larger the archive size is because everything before the requested file has to be decompressed and each file is requested separately. Both of these problems are solved by ratarmount.

Filenames and linenumbers for the matches of cat and grep

My code
$ *.php | grep google
How can I print the filenames and linenumbers next to each match?
grep google *.php
if you want to span many directories:
find . -name \*.php -print0 | xargs -0 grep -n -H google
(as explained in comments, -H is useful if xargs comes up with only one remaining file)
You shouldn't be doing
$ *.php | grep
That means "run the first PHP program, with the name of the rest wildcarded as parameters, and then run grep on the output".
It should be:
$ grep -n -H "google" *.php
The -n flag tells it to do line numbers, and the -H flag tells it to display the filename even if there's only file. Grep will default to showing the filenames if there are multiple files being searched, but you probably want to make sure the output is consistent regardless of how many matching files there are.
grep -RH "google" *.php
Please take a look at ack at http://betterthangrep.com. The equivalent in ack of what you're trying is:
ack google --php
find ./*.php -exec grep -l 'google' {} \;
Use "man grep" to see other features.
for i in $(ls *.php); do grep -n --with-filename "google" $i; done;
find . -name "*.php" -print | xargs grep -n "searchstring"

Use grep --exclude/--include syntax to not grep through certain files

I'm looking for the string foo= in text files in a directory tree. It's on a common Linux machine, I have bash shell:
grep -ircl "foo=" *
In the directories are also many binary files which match "foo=". As these results are not relevant and slow down the search, I want grep to skip searching these files (mostly JPEG and PNG images). How would I do that?
I know there are the --exclude=PATTERN and --include=PATTERN options, but what is the pattern format? The man page of grep says:
--include=PATTERN Recurse in directories only searching file matching PATTERN.
--exclude=PATTERN Recurse in directories skip file matching PATTERN.
Searching on grep include, grep include exclude, grep exclude and variants did not find anything relevant
If there's a better way of grepping only in certain files, I'm all for it; moving the offending files is not an option. I can't search only certain directories (the directory structure is a big mess, with everything everywhere). Also, I can't install anything, so I have to do with common tools (like grep or the suggested find).
Use the shell globbing syntax:
grep pattern -r --include=\*.cpp --include=\*.h rootdir
The syntax for --exclude is identical.
Note that the star is escaped with a backslash to prevent it from being expanded by the shell (quoting it, such as --include="*.cpp", would work just as well). Otherwise, if you had any files in the current working directory that matched the pattern, the command line would expand to something like grep pattern -r --include=foo.cpp --include=bar.cpp rootdir, which would only search files named foo.cpp and bar.cpp, which is quite likely not what you wanted.
Update 2021-03-04
I've edited the original answer to remove the use of brace expansion, which is a feature provided by several shells such as Bash and zsh to simplify patterns like this; but note that brace expansion is not POSIX shell-compliant.
The original example was:
grep pattern -r --include=\*.{cpp,h} rootdir
to search through all .cpp and .h files rooted in the directory rootdir.
If you just want to skip binary files, I suggest you look at the -I (upper case i) option. It ignores binary files. I regularly use the following command:
grep -rI --exclude-dir="\.svn" "pattern" *
It searches recursively, ignores binary files, and doesn't look inside Subversion hidden folders, for whatever pattern I want. I have it aliased as "grepsvn" on my box at work.
Please take a look at ack, which is designed for exactly these situations. Your example of
grep -ircl --exclude=*.{png,jpg} "foo=" *
is done with ack as
ack -icl "foo="
because ack never looks in binary files by default, and -r is on by default. And if you want only CPP and H files, then just do
ack -icl --cpp "foo="
grep 2.5.3 introduced the --exclude-dir parameter which will work the way you want.
grep -rI --exclude-dir=\.svn PATTERN .
You can also set an environment variable: GREP_OPTIONS="--exclude-dir=\.svn"
I'll second Andy's vote for ack though, it's the best.
I found this after a long time, you can add multiple includes and excludes like:
grep "z-index" . --include=*.js --exclude=*js/lib/* --exclude=*.min.js
The suggested command:
grep -Ir --exclude="*\.svn*" "pattern" *
is conceptually wrong, because --exclude works on the basename. Put in other words, it will skip only the .svn in the current directory.
In grep 2.5.1 you have to add this line to ~/.bashrc or ~/.bash profile
export GREP_OPTIONS="--exclude=\*.svn\*"
I find grepping grep's output to be very helpful sometimes:
grep -rn "foo=" . | grep -v "Binary file"
Though, that doesn't actually stop it from searching the binary files.
If you are not averse to using find, I like its -prune feature:
find [directory] \
-name "pattern_to_exclude" -prune \
-o -name "another_pattern_to_exclude" -prune \
-o -name "pattern_to_INCLUDE" -print0 \
| xargs -0 -I FILENAME grep -IR "pattern" FILENAME
On the first line, you specify the directory you want to search. . (current directory) is a valid path, for example.
On the 2nd and 3rd lines, use "*.png", "*.gif", "*.jpg", and so forth. Use as many of these -o -name "..." -prune constructs as you have patterns.
On the 4th line, you need another -o (it specifies "or" to find), the patterns you DO want, and you need either a -print or -print0 at the end of it. If you just want "everything else" that remains after pruning the *.gif, *.png, etc. images, then use
-o -print0 and you're done with the 4th line.
Finally, on the 5th line is the pipe to xargs which takes each of those resulting files and stores them in a variable FILENAME. It then passes grep the -IR flags, the "pattern", and then FILENAME is expanded by xargs to become that list of filenames found by find.
For your particular question, the statement may look something like:
find . \
-name "*.png" -prune \
-o -name "*.gif" -prune \
-o -name "*.svn" -prune \
-o -print0 | xargs -0 -I FILES grep -IR "foo=" FILES
On CentOS 6.6/Grep 2.6.3, I have to use it like this:
grep "term" -Hnir --include \*.php --exclude-dir "*excluded_dir*"
Notice the lack of equal signs "=" (otherwise --include, --exclude, include-dir and --exclude-dir are ignored)
git grep
Use git grep which is optimized for performance and aims to search through certain files.
By default it ignores binary files and it is honoring your .gitignore. If you're not working with Git structure, you can still use it by passing --no-index.
Example syntax:
git grep --no-index "some_pattern"
For more examples, see:
How to exclude certain directories/files from git grep search.
Check if all of multiple strings or regexes exist in a file
I'm a dilettante, granted, but here's how my ~/.bash_profile looks:
export GREP_OPTIONS="-orl --exclude-dir=.svn --exclude-dir=.cache --color=auto" GREP_COLOR='1;32'
Note that to exclude two directories, I had to use --exclude-dir twice.
If you search non-recursively you can use glop patterns to match the filenames.
grep "foo" *.{html,txt}
includes html and txt. It searches in the current directory only.
To search in the subdirectories:
grep "foo" */*.{html,txt}
In the subsubdirectories:
grep "foo" */*/*.{html,txt}
In the directories are also many binary files. I can't search only certain directories (the directory structure is a big mess). Is there's a better way of grepping only in certain files?
ripgrep
This is one of the quickest tools designed to recursively search your current directory. It is written in Rust, built on top of Rust's regex engine for maximum efficiency. Check the detailed analysis here.
So you can just run:
rg "some_pattern"
It respect your .gitignore and automatically skip hidden files/directories and binary files.
You can still customize include or exclude files and directories using -g/--glob. Globbing rules match .gitignore globs. Check man rg for help.
For more examples, see: How to exclude some files not matching certain extensions with grep?
On macOS, you can install via brew install ripgrep.
find and xargs are your friends. Use them to filter the file list rather than grep's --exclude
Try something like
find . -not -name '*.png' -o -type f -print | xargs grep -icl "foo="
The advantage of getting used to this, is that it is expandable to other use cases, for example to count the lines in all non-png files:
find . -not -name '*.png' -o -type f -print | xargs wc -l
To remove all non-png files:
find . -not -name '*.png' -o -type f -print | xargs rm
etc.
As pointed out in the comments, if some files may have spaces in their names, use -print0 and xargs -0 instead.
Try this one:
$ find . -name "*.txt" -type f -print | xargs file | grep "foo=" | cut -d: -f1
Founded here: http://www.unix.com/shell-programming-scripting/42573-search-files-excluding-binary-files.html
those scripts don't accomplish all the problem...Try this better:
du -ha | grep -i -o "\./.*" | grep -v "\.svn\|another_file\|another_folder" | xargs grep -i -n "$1"
this script is so better, because it uses "real" regular expressions to avoid directories from search. just separate folder or file names with "\|" on the grep -v
enjoy it!
found on my linux shell! XD
Look # this one.
grep --exclude="*\.svn*" -rn "foo=" * | grep -v Binary | grep -v tags
The --binary-files=without-match option to GNU grep gets it to skip binary files. (Equivalent to the -I switch mentioned elsewhere.)
(This might require a recent version of grep; 2.5.3 has it, at least.)
suitable for tcsh .alias file:
alias gisrc 'grep -I -r -i --exclude="*\.svn*" --include="*\."{mm,m,h,cc,c} \!* *'
Took me a while to figure out that the {mm,m,h,cc,c} portion should NOT be inside quotes.
~Keith
To ignore all binary results from grep
grep -Ri "pattern" * | awk '{if($1 != "Binary") print $0}'
The awk part will filter out all the Binary file foo matches lines
Try this:
Create a folder named "--F" under currdir ..(or link another folder there renamed to "--F" ie double-minus-F.
#> grep -i --exclude-dir="\-\-F" "pattern" *

Resources