Unix Find Replace Special Characters in Multiple Files - unix

I've got a set of files in a web root that all contain special characters that I'd like to remove (Â,€,â,etc).
My command
find . -type f -name '*.*' -exec grep -il "Â" {} \;
finds & lists out the files just fine, but my command
find . -type f -name '*.*' -exec tr -d 'Â' '' \;
doesn't produce the results I'm looking for.
Any thoughts?

to replace all non-ascii characters in all files inside the current directory you could use:
find . -type f | xargs perl -pi.bak -e 's,[^[:ascii:]],,g'
afterwards you will have to find and remove all the '.bak' files:
find . -type f -a -name \*.bak | xargs rm

I would recommend looking into sed. It can be used to replace the contents of the file.
So you could use the command:
find . -type f -name '*.*' -exec sed -i "s/Â//" {} \;
I have tested this with a simple example and it seems to work. The -exec should handle files with whitespace in their name, but there may be other vulnerabilities I'm not aware of.

Use
tr -d 'Â'
What does the ' ' stands for? On my system using your command produces this error:
tr: extra operand `'
Only one string may be given when deleting without squeezing repeats.
Try `tr --help' for more information.

sed 's/ø//' file.txt
That should do the trick for replacing a special char with an empty string.
find . -name "*.*" -exec sed 's/ø//' {} \

It would be helpful to know what "doesn't produce the results I'm looking for" means. However, in your command tr is not provided with the filenames to process. You could change it to this:
find . -type f -name '*.*' -exec tr -d 'Â' {} \;
Which is going to output everything to stdout. You probably want to modify the files instead. You can use Grundlefleck's answer, but one of the issues alluded to in that answer is if there are large numbers of files. You can do this:
find . -type f -name '*.*' -print0 | xargs -0 -I{} sed -i "s/Â//" \{\}
which should handle files with spaces in their names as well as large numbers of files.

with bash shell
for file in *.*
do
case "$file" in
*[^[:ascii:]]* )
mv "$file" "${file//[^[:ascii:]]/}"
;;
esac
done

I would use something like this.
for file in `find . -type f`
do
# Search for char end remove it. Save file as file.new
sed -e 's/[ۉ]//g' $file > $file.new
# mv file.new to file DON'T RUN IF YOU WILL NOT OVERITE ORIGINAL FILE
mv $file.new $file
done
The above script will fail as levislevis85 has mentioned it with spaces in filenames. This would not be the case if you use the following code.
find . -type f | while read file
do
# Search for char end remove it. Save file as file.new
sed -e 's/[ۉ]//g' "$file" > "$file".new
# mv file.new to file DON'T RUN IF YOU WILL NOT OVERITE ORIGINAL FILE
mv "$file".new "$file"
done

Related

BASH: performing a regex replace on a path from find command

AIM: to find all JS|TS excluding *.spec.js files in a directory but replace the base path with ./
I have this command
find src/app/directives -name '*.[j|t]s' ! -name '*.spec.js' -exec printf "import \"%s\";\n" {} \;
which in said directory prints the marked JS files. However I want to replace the src/app with ./
I've tried playing with [[]] and this command but they don't work.
find src/app/components -name '*.[j|t]s' ! -name '*.spec.js' -exec printf "import \"%s\";\n" ${{}/src
/hi} \;
zsh: bad substitution
Given your "AIM", all you really need is:
find src/app/directives -type f -name "*.[jt]s" ! -name "*.spec.js" -printf "./%f\n"
The reason being is the '|' in your character-class isn't matching anything, but isn't hurting anything for that matter. Your second ! -name "*.spec.js" is fine. You don't need -exec and can simply use -printf "./%f\n" (where "%f" provides the filename only for the current file). You simply prepend the "./" as part of the -printf format-string.
Let me know if I misunderstood your AIM or if you have further questions.
Removing src/app/directives While Preserving Remaining Path
If you want to preserve the remainder of the path after src/app/directives (essentially just replacing it with '.'), you can use a short helper-script with the POSIX parameter expansion to trim src/app/directives from the front of the string replacing it with '.' using printf in the helper script. For example the helper could be:
#!/bin/zsh
printf ".%s" "${1#./src/app/directives}"
(note: the leading "./" being removed along with src/app/directives is prepended by find, the '.' added by the printf format-string will result in the returned filename being ./rest/of/path/to/filename)
Call the script whatever you like, helper.sh below. Make it executable chmod +x helper.sh.
The find call would then be:
find src/app/directives -type f -name "*.[jt]s" ! -name "*.spec.js" -exec path/to/helper.sh '{}' \;
Give that a go and let me know if it does what you are needing.

UNIX, find string in all files in sub directory with line numbers and filenames

I'm trying to do a recursive text string search in UNIX and have the results show both the filename and line number on which the text appears within the file. Based on some other answers here I have the following code, but it only shows line numbers and not filenames:
find /my/directory -type f -exec grep -ni "text to search" {} \;
It would also be great to have this command ignore everything except for .LOG files. For what it's worth, grep -r is not supported on my system. Thanks!
What about:
find /my/directory -type f -name "*.LOG" -print0 | xargs -0 grep -Hni "text to find"
When your findand grep don't support advances options, try adding /dev/null
find /my/directory -type f -exec grep -ni "text to search" {} /dev/null \;

find and then grep and then iterate through list of files

I have following script to replace text.
grep -l -r "originaltext" . |
while read fname
do
sed 's/originaltext/replacementText/g' $fname > tmp.tmp
mv tmp.tmp $fname
done
Now in the first statement of this script , I want to do something like this.
find . -name '*.properties' -exec grep "originaltext" {} \;
How do I do that?
I work on AIX, So --include-file wouldn't work .
In general, I prefer to use find to FIND files rather than grep. It looks obvious : )
Using process substitution you can feed the while loop with the result of find:
while IFS= read -r fname
do
sed 's/originaltext/replacementText/g' $fname > tmp.tmp
mv tmp.tmp $fname
done < <(find . -name '*.properties' -exec grep -l "originaltext" {} \;)
Note I use grep -l (big L) so that grep just returns the name of the file matching the pattern.
You could go the other way round and give the list of '*.properties' files to grep. For example
grep -l "originaltext" `find -name '*.properties'`
Oh, and if you're on a recent linux distribution, there is an option in grep to achieve that without having to create that long list of files as argument
grep -l "originaltext" --include='*.properties' -r .

Remove underscores from all filenames within a directory

I have a folder "model" with files named like:
a_EmployeeData
a_TableData
b_TestData
b_TestModel
I basically need to drop the underscore and make them:
aEmployeeData
aTableData
bTestData
bTestModel
Is there away in the Unix Command Line to do so?
This will correctly process files containing odd characters like spaces or even newlines and should work on any Unix / Linux distribution being only based on POSIX syntax.
find model -type f -name "*_*" -exec sh -c 'd=$(dirname "$1"); mv "$1" "$d/$(basename "$1" | tr -d _)"' sh {} \;
Here is what it does:
For each file (not directory) containing an underscore in its name under the model directory and its subdirectories, rename the file in place with all the underscores stripped out.
You can do this simply with bash.
for file in /path/to/model/*; do
mv "$file" "${file/_/}"
done
If you have rename command available then simply do
rename 's/_//' /path/to/model/*
for f in model/* ; do mv "$f" `echo "$f" | sed 's/_//g'` ; done
Edit: modified a few things thanks to suggestions by others, but I'm afraid my code is still bad for strange filenames.
maybe this:
find model -name "*_*" -type f -maxdepth 1 -print | sed -e 'p;s/_//g' | xargs -n2 echo mv
Decomposition:
find all plain files in the directory model what contains at least one underscore, and don't search subdirectories
with the sed make filename adjustments - replace the _ with nothing
also print the old name
fed the two filenames to xargs what will rename the files with mv
The above is for a dry-run. When satisfied, remove the echo before mv for actual rename.
Warning: Will not work if filename contains spaces. If you have GNU sed you can
find . -name "*_*" -maxdepth 1 -print0 | sed -z 'p;s/_//g' | xargs -0 -n2 echo mv
and will works with a filenames with spaces too...
In zsh:
autoload zmv # in ~/.zshrc
cd model && zmv '(**/)(*)' '$1${2//_}'
marc#panic:~$ echo 'a_EmployeeData' | tr -d '_'
aEmployeeData
I had the same problem on my machine, but the filenames had more than one underscore. I used rename with the g option so that all underscores get removed:
find model/ -maxdepth 1 -type f | rename 's/_//g'
Or if there are no subdirectories, just
rename 's/_//g'
If you don't have rename, see Jaypal Singh's answer.
Use the global flag /g with your replace pattern to replace all occurrences within the filename.
find . -type f -print0 | xargs -0 rename 's/_//g'
Or if you want underscores replaced with spaces then use this:
find . -type f -print0 | xargs -0 rename 's/_/ /g'
If you like to live dangerously add the force flag -f in front of your replace pattern rename -f 's/_//g'

How do I concatenate files in a subdirectory with Unix find execute and cat into a single file?

I can do this:
$ find .
.
./b
./b/foo
./c
./c/foo
And this:
$ find . -type f -exec cat {} \;
This is in b.
This is in c.
But not this:
$ find . -type f -exec cat > out.txt {} \;
Why not?
find's -exec argument runs the command you specify once for each file it finds. Try:
$ find . -type f -exec cat {} \; > out.txt
or:
$ find . -type f | xargs cat > out.txt
xargs converts its standard input into command-line arguments for the command you specify. If you're worried about embedded spaces in filenames, try:
$ find . -type f -print0 | xargs -0 cat > out.txt
Hmm... find seems to be recursing as you output out.txt to the current directory
Try something like
find . -type f -exec cat {} \; > ../out.txt
You could do something like this :
$ cat `find . -type f` > out.txt
How about just redirecting the output of find into a file, since all you're wanting to do is cat all the files into one large file:
find . -type f -exec cat {} \; > /tmp/out.txt
Maybe you've inferred from the other responses that the > symbol is interpreted by the shell before find gets it as an argument. But to answer your "why not" lets look at your command, which is:
$ find . -type f -exec cat > out.txt {} \;
So you're giving find these arguments: "." "-type" "f" "-exec" "cat" you're giving the redirect these arguments: "out.txt" "{}" and ";". This confuses find by not terminating the -exec arguments with a semi-colon and by not using the file name as an argument ("{}"), it possibly confuses the redirection too.
Looking at the other suggestions you should really avoid creating the output in the same directory you're finding in. But they'd work with that in mind. And the -print0 | xargs -0 combination is greatly useful. What you wanted to type was probably more like:
$ find . -type f -exec cat \{} \; > /tmp/out.txt
Now if you really only have one level of sub directories and only normal files, you can do something silly and simple like this:
cat `ls -p|sed 's/\/$/\/*/'` > /tmp/out.txt
Which gets ls to list all your files and directories appending '/' to the directories, while sed will append a '*' to the directories. The shell will then interpret this list and expand the globs. Assuming that doesn't result in too many files for the shell to handle, these will all be passed as arguments to cat, and the output will be written to out.txt.
Or just leave out the find which is useless if you use the really great Z shell (zsh), and you can do this:
setopt extendedglob
(this should be in your .zshrc)
Then:
cat **/*(.) > outfile
just works :-)
Try this:
(find . -type f -exec cat {} \;) > out.txt
In bash you could do
cat $(find . -type f) > out.txt
with $( ) you can get the output from a command and pass it to another

Resources