Korn shell metacharacters: What does !(some text) mean? - unix

Trying to figure out how ksh is processing the construct !(text). For example,
$ echo !(hello)
produces a list of files in the current directory (similar to the output of an ls command, except it's sorted into columns rather than rows). It doesn't matter what text is in the parens, the output is the same.
Can anyone enlighten me as to what the command is actually doing? Thanks!

It echoes all files except hello. You can also use wildcards like echo !(*.java)

Here's some more detailed information. For more info, look in the "file name generation" section of the ksh man page (bash works the same way). See here for more patterns: https://www.mkssoftware.com/docs/man1/sh.1.asp
A sub-pattern begins with a ?, *, +, #, or ! character followed by a pattern-list
enclosed in parentheses. Pattern-lists themselves can contain sub-patterns.
The following list describes valid sub-patterns.
?(pattern-list)
Matches exactly zero or exactly one occurrence of the specified pattern-list.
*(pattern-list)
Matches zero or more occurrences of the specified pattern-list.
+(pattern-list)
Matches one or more occurrences of the specified pattern-list.
#(pattern-list)
Matches exactly one occurrence of the specified pattern-list.
!(pattern-list)
Matches any string that does not match the specified pattern-list.
So for your example, when the shell sees the unquoted exclamation point, followed by parenthesis it goes into file name matching mode, then it displays files in the current directory that do not match "hello".

Related

Delete part of file name (up to but NOT including string)

I have a script that was very kindly provided for me a while ago which allowed me to generate input files by inserting coordinates from a series of .xyz files into a template file (Create new files by copying contents of coordinate files into template file).
I'm trying to adapt that script to do something very similar, but different in a very slight, but annoying way. In the script, the new directories created to house these new files are named like this:
# File name is in the form '....Hnnn.xyz';
# this will parse nnn from that name.
local inputNumber=$coordFile
# Remove '.xyz'.
inputNumber=${inputNumber%.xyz}
# Remove everything up to and including the 'H'.
inputNumber=${inputNumber##*H}
# Subdirectory name is based on the input number.
local outDir=$baseDir/D$inputNumber
# Create the directory if it doesn't exist.
if [[ ! -d $outDir ]]; then
mkdir $outDir
fi
This worked for my last problem, because the files were all named in the form xxxx_DH000.xyz. However, now the files I have are named using the form xxxx.000.xyz. While everything else in the script works, I cannot figure out how to name the new directories in the form 000.
The line in the script which I think needs to be edited slightly is where it says inputNumber=${inputNumber##*H}. What I cannot figure out is how to get the script to delete everything up to but not including a 0. I've searched online, but the only questions/answers I've found relating to the renaming of files by stripping part of the original names speaks about deleting everything 'up to and including' a string.
I was able to generate directories named 1, 2, 3, etc. with inputNumber=${inputNumber##*0}, however I want all three digits present (i.e. I would like create directories 001, 002, 003, etc.).
As an aside, I cannot use the . as the cutoff point, as there are multiple .s in each file name. An example of one of the file names is tma.h2s-2-pes-b97m-d4-tz.011.xyz.
Is there some way to get the script to simply name the files based on the full three digit number?
Although it's not needed in this case, zsh does support deleting text just before a matched pattern in a string. These parameter expansions will remove everything prior to the first 0 in the string, but keep the 0:
inputNumber='tma.h2s-2-pes-b97m-d4-tz.011.xyz'
inputNumber=${inputNumber:r} # remove '.xyz'
inputNumber=${(SM)inputNumber##0*}
print ${inputNumber}
# ==> 011
This includes a few zsh-isms:
${...:r} returns the 'root' of a filename, removing the extension.
(S) - parameter expansion flag to change the behavior of the ## expansion. It will now search for patterns in the middle of a string, not just at the beginning.
(M) - flag to include the pattern match (the 0*) in the result.
This depends on the number always starting with 0, which may not be a good choice - what file comes after 099?
This next version uses a zsh extended glob pattern to find a number between two periods, and returns that number - i.e. it will find the number in .11., .011., or .2345., but not in .x11.:
coordFile='tma.h2s-2-pes-b97m-d4-tz.022.xyz'
inputNumber=${(*)coordFile//(#b)*.(<->).*/${match}}
print ${inputNumber}
# ==> 022
Some of the pieces:
${...//.../...} - substitution expansion.
(*) - enables extendedglob for this expansion.
(#b) - globbing flag to enable 'backreferences', so that $match will work.
<-> - matches a number. This can be restricted to a range if needed, like <100-199>.
(<->) - puts the number into a match group.
*. and .* - everything before and after the number; these are not in the match group.
${match} - the matched string from the parenthesized part of the pattern. This is used as the replacement for the entire string, so we get just the number. If more than one part of the input string matches the pattern, this will be the last one. match is actually an array, but since there's only one match group in the pattern, it does not need to be indexed with ${match[1]}.
This variant uses a standard regular expression to find the number:
coordFile='tma.h2s-2-pes-b97m-d4-tz.033.xyz'
match=
[[ $coordFile =~ .*\\.([[:digit:]]+)\\..* ]]
inputNumber=${match[1]}
print ${inputNumber}
# ==> 033
After the [[ ]] test, the match array will contain matches from any parenthesized groups in the regular expression - here, that will be a set of one or more digits in between two periods / full stops.
But, as #choroba and Fravadona have noted, since the number will be always be at the end of the string, you can use the standard #/##/%/%% expansions to remove parts of the string based only on the .s. This is a common idiom that will be familiar to many shell programmers, and will also work in bash (note that other parts of your original script depend on zsh).
inputNumber='tma.h2s-2-pes-b97m-d4-tz.044.xyz'
inputNumber=${inputNumber%.xyz}
inputNumber=${inputNumber##*.}
print ${inputNumber}
# ==> 044
In zsh everything can be consolidated into a single nested substitution:
baseDir='files/are/here'
coordFile='tma.h2s-2-pes-b97m-d4-tz.055.xyz'
local outDir=$baseDir/D${${coordFile:r}##*.}
print $outDir
# ==> files/are/here/D055

zsh match files not containing dash

I have file listing as the following one:
001file.jpg
003file.jpg
001-800x600-sq.jpg
001-800x600.jpg
002-800x600-sq.jpg
002-800x600.jpg
003-800x600-sq.jpg
003-800x600.jpg
004-800x531-sq.jpg
004-800x531.jpg
005-800x531-sq.jpg
005-800x531.jpg
006-800x531-sq.jpg
006-800x531.jpg
007-800x531-sq.jpg
007-800x531.jpg
008-800x1067-sq.jpg
008-800x1067.jpg
009-800x1067-sq.jpg
009-800x1067.jpg
010-800x533-sq.jpg
010-800x533.jpg
011-800x1200-sq.jpg
011-800x1200.jpg
012-800x533-sq.jpg
012-800x533.jpg
013-800x600-sq.jpg
013-800x600.jpg
014-800x1067-sq.jpg
014-800x1067.jpg
015-800x533-sq.jpg
015-800x533.jpg
016-800x533-sq.jpg
016-800x533.jpg
In ZSH, I want to list all files beginning with any number, not containing dash in filename, so I tried:
print -l <->[^-]*.jpg
with no success. What is wrong with this pattern!?
This is, I think, similar to the case that the documentation for <-> warns about:
Be careful when using other wildcards adjacent to patterns of this form; for example, <0-9>* will actually match any number whatsoever at the start of the string, since the `<0-9>' will match the first
digit, and the `*' will match any others. This is a trap for the unwary, but is in fact an inevitable
consequence of the rule that the longest possible match always succeeds. Expressions such as
`<0-9>[^[:digit:]]*' can be used instead.
In print -l <->[^-]*.jpg, the <-> matches the first digit, then [^-] matches the 2nd digit, and * matches everything thing else.
Use instead
print -l <->[^[:digit:]-]*.jpg

Finding words in a file that contains the word file and does not contain hypthens

I am trying to find words in a file that contains the world file and does not contain hypthens. It looks correct to me but my output shows all the words with a hypthen and word file. My path name is/folder.file.txt
.
cat /folder/file.txt | grep file[!-]*
! isn't the right operator for negation. You need ^.
file[!-]*
will match any string containing the word 'file' and zero or more instances of '!' or '-'.
So basically - anything with the word 'file' in it. If you want to negate a character class, you need to use ^. But the * then allows for zero of the 'not patterns'.
If the dash is immediately after the word file then:
file[^-]
will match:
file1243
somefilefilea
but not:
file-1234
I think what you may be missing from your pattern is that * allows you to ignore part of the pattern.
^file[^-]*$
might do what you're after?
https://www.regex101.com/ will let you test regular expressions.

Difference between dir/**/* and dir/*/* in Unix glob pattern?

It seems that the output are the same when I echoed it.
I also tested other commands such as open, but the results from both are the same.
In traditional sh-style pattern matching, * matches zero or more characters in a component of the file name, so there is no difference between *, **, and ***, either on its own or as part of a larger pattern.
However, there are globbing syntaxes that assign a distinct meaning to **. Pattern matching implemented by the Z shell, for example, expands x/**/y to all file names beginning with x/ and ending in /y regardless of how many directories are in between, thus matching all of x/y, x/subdir/y, x/subdir1/subdir2/y, etc. This syntax was later implemented by bash, although only enabled when the globstar configuration option is set by the user.

error in Unix (scripts)

somebody knows what does this error mean?
Missing -. in google I found nothing about this
The only case where tcsh can produce that error message is when you're trying to substitute a range of words from an array variable, and the selector is syntactically incorrect.
Quoting the tcsh man page:
$name[selector]
${name[selector]}
Substitutes only the selected words from the value of name.
The selector is subjected to `$' substitution and may consist
of a single number or two numbers separated by a `-'. The
first word of a variable's value is numbered `1'. If the first
number of a range is omitted it defaults to `1'. If the last
member of a range is omitted it defaults to `$#name'. The
selector `*' selects all words. It is not an error for a range
to be empty if the second argument is omitted or in range.
For example:
$ echo $path[5-6]
/usr/sbin /usr/bin
$ echo $path[5_6]
Missing -.
Perhaps if you had followed up when you were asked for more information (like, say, some code from the failing script), it wouldn't have taken over a year to get an answer.

Resources