Unix case insensitive command line search containing wldcards and spaces

Unix case insensitive command line search containing wldcards and spaces - unix

I am attempting to come up with a method to remotely find a list of files on our AIX UNIX machine that meet, what seems in windows, like simple criteria. It needs to be case insensitive (sigh), use wildcards (*) and possibly contain spaces in the path.
For my tests below I was using the ksh shell. However it will need to work in an ssh shell as well.
I am attempting to implement secure FTP in Visual Basic 6 (I know) using plink, command line and a batch file.
Basically find a file like the one below but with case insensitivity:
ls -1 -d -p "/test/rick/01012017fosterYYY - Copy.txt" | grep -v '.*/$'
Thanks for any help.
ls -1 -d -p /test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy] - [Cc][Oo][Pp][Yy].[Tt][Xx][Tt] | grep -v '.*\/$'**
fails with:
ls: 0653-341 The file /test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy] do
es not exist.
ls: 0653-341 The file - does not exist.
ls: 0653-341 The file [Cc][Oo][Pp][Yy].[Tt][Xx][Tt] does not exist.
ls -1 -d -p /test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy].[Tt][Xx][Tt] | grep -v '.*\/$'**
success - as long as there are no spaces.
ls -1 -d -p "/test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy].[Tt][Xx][Tt]" | grep -v '.*\/$'**
fails with:
ls: 0653-341 The file /test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy].[T
t][Xx][Tt] does not exist.
-- Assumption: We cannot use quotes with wildcard characters
ls -1 -d -p "/test/rick/01012017fosterYYY - Copy.txt" | grep -v '.*\/$'**
success. not case insensitive.
ls -1 -d -p /test/rick/[0][1][0][1][2][0][1][7][Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy] - [Cc][Oo][Pp][Yy].[Tt][Xx][Tt] | grep -v '.*\/$'**
fails with:
ls: 0653-341 The file /test/rick/[0][1][0][1][2][0][1][7][Ff][Oo][Ss][Tt][Ee][Rr
][Yy][Yy][Yy][ does not exist.
ls: 0653-341 The file ][-][ does not exist.
ls: 0653-341 The file ][Cc][Oo][Pp][Yy].[Tt][Xx][Tt] does not exist.
ls -1 -d -p /test/rick/[0][1][0][1][2][0][1][7][Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy][ ][-][ ][Cc][Oo][Pp][Yy].[Tt][Xx][Tt] | grep -v '.*\/$'**
fails with:
ls: 0653-341 The file /test/rick/[0][1][0][1][2][0][1][7][Ff][Oo][Ss][Tt][Ee][Rr
][Yy][Yy][Yy][ does not exist.
ls: 0653-341 The file ][-][ does not exist.
ls: 0653-341 The file ][Cc][Oo][Pp][Yy].[Tt][Xx][Tt] does not exist.
ls -1 -d -p /test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy]?-?[Cc][Oo][Pp][Yy].[Tt][Xx][Tt] | grep -v '.*\/$'**
success. not very helpful though.
ls -1 -d -p /test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy][ ]-[ ][Cc][Oo][Pp][Yy].[Tt][Xx][Tt] | grep -v '.*\/$'**
fails with:
ls: 0653-341 The file /test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy][ d
oes not exist.
ls: 0653-341 The file ]-[ does not exist.
ls: 0653-341 The file ][Cc][Oo][Pp][Yy].[Tt][Xx][Tt] does not exist.
ls -1 -d -p /test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy]{ }-{ }[Cc][Oo][Pp][Yy].[Tt][Xx][Tt] | grep -v '.*\/$'**
fails with:
ls: 0653-341 The file /test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy]{ d
oes not exist.
ls: 0653-341 The file }-{ does not exist.
ls: 0653-341 The file }[Cc][Oo][Pp][Yy].[Tt][Xx][Tt] does not exist.
ls -1 -d -p /test/rick/*01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy] - [Cc][Oo][Pp][Yy].[Tt][Xx][Tt]* | grep -v '.*\/$'**
fails with
ls: 0653-341 The file /test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy] d
oes not exist.
ls: 0653-341 The file - does not exist.
ls: 0653-341 The file [Cc][Oo][Pp][Yy].[Tt][Xx][Tt] does not exist.
ls -1 -d -p "/test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy] - [Cc][Oo][Pp][Yy].[Tt][Xx][Tt]" | grep -v '.*\/$'**
fails with:
ls: 0653-341 The file /test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy] -
[Cc][Oo][Pp][Yy].[Tt][Xx][Tt] does not exist.

ls doesn't do pattern matching, any wildcard expansion (globbing) is done by the shell. The glob pattern language is different from regular expressions. Read the ksh documentation for information about globbing ("File Name Generation" in the manpage).
So when you do:
$ touch foo flo fum
$ ls -1 f[ol]o
flo
foo
... the shell notices the globbing characters [], reads the directory contents, replaces it with the matching files, and passes those as parameters to ls. You can show this by using echo instead:
$ echo f[ol]o
flo foo
ksh has globbing options available with the ~() construct, option i is "Treat the match as case insensitive" :
ksh$ touch foo FoO FOO
ksh$ echo ~(i)foo
foo FoO FOO
bash has a nocaseglob shopt option:
bash$ shopt -s nocaseglob
bash$ touch fOo
bash$ echo FO*
foo
Although note that some globbing character needs to be present to make the magic happen:
bash$ echo FOO
FOO
bash$ echo [F]OO
foo
(to keep this option change local, see https://unix.stackexchange.com/questions/310957/how-to-undo-a-set-x/310963)
It looks as if you're using grep -v '.*/$' to remove lines that are directories. The .* is superfluous here -- grep -v '/$' is equivalent.
But find is a better tool for this kind of searching and filtering, implementing -type f (match regular files) by actually looking at the file attributes, rather than by parsing a bit of ASCII in a listing.
$ touch foo FOO FoO
$ mkdir fOo
$ find . -maxdepth 1 -type f -iname "foo"
./FOO
./foo
./FoO

You could use find's -iname option to allow for case-insensitive searching, so for the example you've provided any of the following should find your file:
find /test/rick -maxdepth 1 -iname '01012017fosterYYY - copy.txt'
# or
find /test/rick -maxdepth 1 -iname '01012017fosteryyy - copy.txt'
# or
find /test/rick -maxdepth 1 -iname '01012017FOSTERyyy - cOpY.txt'
-maxdepth 1 : don't search in sub-directories
-iname : allow for case-insensitive searching

For case insensitive wildcard searches when -maxdepth and -iname flags are not available for AIX Find , you can pass the Find results to Grep:
find /test/rick/. \( ! -name . -prune \) -type f -print | grep -i ".*foster.*\.txt"
find [InThisFolder] [ExcludeSubfolders] [FileTypes] | grep [InsensitiveWildcardName]
Though, this can still be problematic if you have a folder structure like "/test/rick/rick/".
The following code gives results with the current directory signifier ".":
find /test/rick/. \( ! -name . -prune \) -type f -print | grep -i ".*foster.*\.txt"
But you can pass the results to sed and find "/./" and replace with "/".
find /test/rick/. \( ! -name . -prune \) -type f -print | grep -i ".*foster.*\.txt" | sed 's/\/\.\//\//g'
* UPDATE *
Based on this page: http://mywiki.wooledge.org/ParsingLs
I’ve come up with the following command (for loop on file expansion or globbing) which avoids the problematic "/test/rick/rick/" folder structure from the find | grep solution above. It searches a folder from any folder, handles spaces, and handles case insensitivity without having to specify escape characters or upper/lower matching ([Aa]).
Just modify the searchfolder and searchpattern:
searchfolder="/test/rick"; searchpattern="*foster*.txt"; for file in "$searchfolder"/*.*; do [[ -e "$file" ]] || continue; if [[ "$(basename "$file" | tr '[:upper:]' '[:lower:]')" = $searchpattern ]]; then echo "$file"; fi; done
It does this:
Set the folder path to search (searchfolder="/test/rick";)
Set the search pattern (searchpattern="*foster*.txt")
Loop for every file on the search folder (for file in "$searchfolder"/*.*;)
Make sure the file exists ( [[ -e "$file" ]] || continue;)
Transform any base file name uppercase characters to lowercase (basename "$file" | tr '[:upper:]' '[:lower:]')
Test if the lowered base file name matches the search pattern and if so
then print the full path and filename (if [[ $(basename "$file" | tr
'[:upper:]' '[:lower:]') = $searchpattern ]]; then echo "$file"; fi;)
Tested on AIX (Version 6.1.0.0) in ksh (Version M-11/16/88f) and ksh93 (Version M-12/28/93e).

What I finally used (because I don't have access to -maxdepth or -iname) was just to use case insensitive wildcards together with quotes around spaces.
ls -1 -d -p /test/rick/01012017[Ff][Oo][Ss][Tt][Ee][Rr][Yy][Yy][Yy]' '-' '[Cc][Oo][Pp][Yy].[Tt][Xx][Tt] | grep -v '.*\/$'
That way I don't have to install or upgrade anything and probably cause more problems just so I can get a simple list of files.
NOTE: AIX UNIX will still throw in some garbage errors if you have any sub directories under the path. I tapped out on this and just parsed these useless messages out on the client side.
Thanks everyone who responded.

Related

dynamically pass string to Rscript argument with sed

I wrote a script in R that has several arguments. I want to iterate over 20 directories and execute my script on each while passing in a substring from the file path as my -n argument using sed. I ran the following:
find . -name 'xray_data' -exec sh -c 'Rscript /Users/Caitlin/Desktop/DeMMO_Pubs/DeMMO_NativeRock/DeMMO_NativeRock/R/scipts/dataStitchR.R -f {} -b "{}/SEM_images" -c "{}/../coordinates.txt" -z ".tif" -m ".tif" -a "Unknown|SEM|Os" -d "overview" -y "overview" --overview "overview.*tif" -p FALSE -n "`sed -e 's/.*DeMMO.*[/]\(.*\)_.*[/]xray_data/\1/' "{}"`"' sh {} \;
which results in this error:
ubs/DeMMO_NativeRock/DeMMO_NativeRock/R/scipts/dataStitchR.R -f {} -b "{}/SEM_images" -c "{}/../coordinates.txt" -z ".tif" -m ".tif" -a "Unknown|SEM|Os" -d "overview" -y "overview" --overview "overview.*tif" -p FALSE -n "`sed -e 's/.*DeMMO.*[/]\(.*\)_.*[/]xray_data/\1/' "{}"`"' sh {} \;
sh: command substitution: line 0: syntax error near unexpected token `('
sh: command substitution: line 0: `sed -e s/.*DeMMO.*[/](.*)_.*[/]xray_data/1/ "./DeMMO1/D1T3rep_Dec2019_Ellison/xray_data"'
When I try to use sed with my pattern on an example file path, it works:
echo "./DeMMO1/D1T1exp_Dec2019_Poorman/xray_data" | sed -e 's/.*DeMMO.*[/]\(.*\)_.*[/]xray_data/\1/'
which produces the correct substring:
D1T1exp_Dec2019
I think there's an issue with trying to use single quotes inside the interpreted string but I don't know how to deal with this. I have tried replacing the single quotes around the sed pattern with double quotes as well as removing the single quotes, both result in this error:
sed: RE error: illegal byte sequence
How should I extract the substring from the file path dynamically in this case?

To loop through the output of find.
while IFS= read -ru "$fd" -d '' files; do
echo "$files" ##: do whatever you want to do with the files here.
done {fd}< <(find . -type f -name 'xray_data' -print0)
No embedded commands in quotes.
It uses a random fd just in case something inside the loop is eating/slurping stdin
Also -print0 delimits the files with null bytes, so it should be safe enough to handle spaces tabs and newlines on the path and file names.
A good start is always put an echo in front of every commands you want to do with the files, so you have an idea what's going to be executed/happen just in case...

This is the solution that ultimately worked for me due to issues with quotes in sed:
for dir in `find . -name 'xray_data'`;
do sampleID="`basename $(dirname $dir) | cut -f1 -d'_'`";
Rscript /Users/Caitlin/Desktop/DeMMO_Pubs/DeMMO_NativeRock/DeMMO_NativeRock/R/scipts/dataStitchR.R -f "$dir" -b "$dir/SEM_images" -c "$dir/../coordinates.txt" -z ".tif" -m ".tif" -a "Unknown|SEM|Os" -d "overview" -y "overview" --overview "overview.*tif" -p FALSE -n "$sampleID";
done

How to find files with a specific pattern in the parent and child directory of my present working directory using a single command?

How to find files with a specific pattern in the parent and child directory of my present working directory using a single command ?
Filename - test.txt, the file has the pattern "nslookup"
This file is present in 3 directories and they are /home, /home/1 and /home/1/2
I am currently at /home/1. I have tried below commands :
find ../ -type f -name "test.txt"
Output :
../test.txt
../home/1/test.txt
../home/1/2/test.txt
I was able to find the files, hence I tried the below command :
$ find ../ -type f -exec grep "nslookup" {} \;
nslookup
nslookup
nslookup
This doesn't display the file names.
Command :
find . -type f -name "test.txt | xargs grep "nslookup' ==> gives me files in
pwd and child directories :
./1/test.txt:nslookup
./test.txt:nslookup
but when I try to search in the parent directory as shown below the results are erroneous :
find ../ -type f -name "test.txt" | xargs grep "nslookup
User#User-PC ~/test
$ uname -a
CYGWIN_NT-6.1 User-PC 2.5.2(0.297/5/3) 2016-06-23 14:29 x86_64 Cygwin

How about this:
grep -r -l nslookup .. | grep test.txt

find + sed, filename output

I have directory: D:/Temp, where there are a lot of subfolders with text files. Each folder has "file.txt". In some file.txt files is a word - "pattern". I would like check how many pattern words there are, and also get the filepath to that file.txt:
find D:/Temp -type f -name "file.txt" -exec basename {} cat {} \; | sed -n '/pattern/p' | wc -l
Output should be:
4
D:/Temp/abc1/file.txt
D:/Temp/abc2/file.txt
D:/Temp/abc3/file.txt
D:/Temp/abc4/file.txt
Or similar.

You could use GNU grep :
grep -lr --include file.txt "pattern" "D:/Temp/"
This will return the file paths.
grep -cr --include file.txt "pattern" "D:/Temp/"
This will return the count (counting the pattern occurences rather than the number of files)
Explanation of the flags :
-r makes grep recursively browse its target, that can then be a directory
--include <glob> makes grep restrict its recursive browsing to files matching the <glob>.
-l makes grep only return the files path. Additionnaly, it will stop parsing a file as soon as it has encountered the pattern.
-c makes grep only return the number of matches

If your file names don't contain spaces then all you need is:
awk '/pattern/{print FILENAME; cnt++; nextfile} END{print cnt+0}' $(find D:/Temp -type f -name "file.txt")
The above used GNU awk for nextfile.

I'd propose you to use two commands : one for find all the files:
find ./ -name "file.txt" -exec fgrep -l "-pattern" {} \;
Another for counting them:
find ./ -name "file.txt" -exec fgrep -l "-pattern" {} \; | wc -l

Previously I've used:
grep -Hc "pattern" $(find D:/temp -type f -name "file.txt")
This will only work if file.txt is found. Otherwise you could use the following which will account for when both files are found or not found:
searchFiles=$(find D:/temp -type f -name "file.txt"); [[ ! -z "$searchFiles" ]] && grep -Hc "pattern" $searchFiles
The output for this would look more like:
D:/Temp/abc1/file.txt 2
D:/Temp/abc2/file.txt 1
D:/Temp/abc3/file.txt 1
D:/Temp/abc4/file.txt 1

I would use
find D:/Temp -type f -name "file.txt" -exec dirname {} \; > tmpfile
wc -l tmpfile
cat tmpfile
rm tmpfile

Give a try to this safe and standard version:
find D:/Temp -type f -name file.txt -printf "%p\0" | xargs -0 bash -c 'printf "%s" "${#}"; grep -c "pattern" "${#}"' | grep ":[1-9][0-9]*$"
For each file.txt file found in D:/Temp directory and sub-directories, the xargs command prints the filename and the number of lines which contain pattern (grep -c).
A final grep ":[1-9][0-9]*$" selects only filenames with a count greater than 0.

The way I'm reading your question, I'm going to answer as if:
some but not all file.txt files contain pattern,
you want a list of the paths leading to file.txt with pattern, and
you want a count of pattern in each of those files.
There are a few options. (Always multiple ways to do anything.)
If your bash is version 4 or higher, you can use globstar to recurse through directories:
shopt -s globstar
for file in **/file.txt; do
if count=$(grep -c 'pattern' "$file"); then
printf "%d %s\n" "$count" "${file%/*}"
fi
done
This works because the if evaluation considers a failed grep (i.e. zero occurrences) to be FALSE, and thus does not print results.
Note that this may be high impact because it launches a separate grep on each file that is found. A lighter weight alternative might be to run a single grep on the fileglob, and parse the results:
shopt -s globstar
grep -c 'pattern' **/file.txt | grep -v ':0$'
This also depends on bash 4, and of course if you have millions of files you may overwhelm bash's command line maximum length. The output of this will be obvious, but you'll need to parse it with care if your filenames contain colons. I.e. cut -d: -f2 may not cut it.
One more option that leverages grep instead of bash might be:
grep -r --include 'file.txt' -c 'pattern' ./ | grep -v ':0$'
This uses GNU grep's --include option which modified the behaviour of -r (recursive). It should work in Linux, FreeBSD, NetBSD, OSX, but not with the default grep on OpenBSD or most SVR4 (Solaris, HP/UX, etc).
Note that I have tested none of these. No liability assumed. May contain nuts.

This should do it:
find . -name "file.txt" -type f -printf '%p\n' | awk '{print} END { print NR }'

Tar only the Directory structure

I want to copy my directory structure excluding the files. Is there any option in the tar to ignore all files and copy only the Directories recursively.

You can use find to get the directories and then tar them:
find .. -type d -print0 | xargs -0 tar cf dirstructure.tar --no-recursion
If you have more than about 10000 directories use the following to work around xargs limits:
find . -type d -print0 | tar cf dirstructure.tar --no-recursion --null --files-from -

Directory names that contain spaces or other special characters may require extra attention. For example:
$ mkdir -p "backup/My Documents/stuff"
$ find backup/ -type d | xargs tar cf directory-structure.tar --no-recursion
tar: backup/My: Cannot stat: No such file or directory
tar: Documents: Cannot stat: No such file or directory
tar: backup/My: Cannot stat: No such file or directory
tar: Documents/stuff: Cannot stat: No such file or directory
tar: Exiting with failure status due to previous errors
Here are some variations to handle these cases of "unusual" directory names:
$ find backup/ -type d -print0 | xargs -0 tar cf directory-structure.tar --no-recursion
Using -print0 with find will emit filenames as null-terminated strings; with -0 xargs will interpret arguments that same way. Using null as a terminator helps ensure that even filenames with spaces and newlines will be interpreted correctly.
It's also possible to pipe results straight from find to tar:
$ find backup/ -type d | tar cf directory-structure.tar -T - --no-recursion
Invoking tar with -T - (or --files-from -) will cause it to read filenames from stdin, expecting each filename to be separated by a line break.
For maximum effect this can be combined with options for null-terminated strings:
$ find . -type d -print0 | tar cf directory-structure.tar --null --files-from - --no-recursion
Of these I consider this last version to be the most robust, because it supports both unusual filenames and (unlike xargs) is not inherently limited by system command-line sizes. (see xargs --show-limits)

for i in `find . -type d`; do mkdir -p /tmp/tar_root/`echo $i|sed 's/\.\///'`; done
pushd /tmp/tar_root
tar cf tarfile.tar *
popd
# rm -fr /tmp/tar_root

go into the folder you want to start at (that's why we use find dot)
save tar file somewhere else. I think I got an error leaving it right there.
tar with r not c. I think with cf you keep creating new files and you only
get the last set of file subdirectories. tar r appends to the tar file.
--no-recursion because the find is giving you your whole list of files already
so you don't want to recurse.
find . -type d |xargs tar rf /somewhereelse/whatever-dirsonly.tar --no-recursion
tar tvf /somewhereelse/whatever-dirsonly.tar |more to check what you got.

For AIX:
tar cvfD some-tarball.tar `find /dir_to_start_from -type d -print`

find command moves files but then files become inaccessible

I ran the following command in a parametrized version of a script:
Script1 as
Nooffiles=`find $1 -mmin $2 -type f -name "$3"|wc -l`
if test $Nooffiles -eq 0
then
exit 1
else
echo "Successful"
find $1 -mmin $2 -type f -name "$3" -exec mv '{}' $4 \;
fi
The script1 works fine. It moves the files from $1 directory to $4. But after it moves the files to the new directory, I have to run another script like this:
Script2 as
for name in `find $1 -type f -name "$2"`
do
filename=`ls $name|xargs -n1 basename`
line=`tail -1 $filename | sed "s/Z/Z|$filename/"`
echo $line >> $3;
echo $filename | xargs -n1 basename;
done
Here, script2 is reading from the directory where the files were moved to by the previous script, script1. They exists there in that directory since the previous moving script worked fine. 'ls' command displays them. But the above script2 says:
File.txt: No such file or directory
Despite ls shows them in the directory, I am getting an error message like this.
Please Help.

Your script really is a mess and please be aware that you should NEVER parse filenames (like the output from ls, or find without -print0 option). See Bash Pitfalls #1.
Apart from that, I think the problem is that in your loop, you truncate the filenames output from find with basename, but then call tail with the base filename as argument, where the file really isn't located in the current folder.
I don't understand what you are doing there, but this is some more correct code that perhaps does next to what you want:
find "$1" -type f -name "$2" -print0 | while read -d '' name
do
filename=`basename "$name"`
tail -1 "$name" | sed "s/Z/Z|$filename/" >> "$3"
echo "$filename"
done
But still, there are pitfalls in this script. It is likely to fail with queer filenames input from find. For example, if your filename contains characters that are special to sed. Or if at some point $filename is --help etc.etc.etc.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex