I have file listing as the following one:
001file.jpg
003file.jpg
001-800x600-sq.jpg
001-800x600.jpg
002-800x600-sq.jpg
002-800x600.jpg
003-800x600-sq.jpg
003-800x600.jpg
004-800x531-sq.jpg
004-800x531.jpg
005-800x531-sq.jpg
005-800x531.jpg
006-800x531-sq.jpg
006-800x531.jpg
007-800x531-sq.jpg
007-800x531.jpg
008-800x1067-sq.jpg
008-800x1067.jpg
009-800x1067-sq.jpg
009-800x1067.jpg
010-800x533-sq.jpg
010-800x533.jpg
011-800x1200-sq.jpg
011-800x1200.jpg
012-800x533-sq.jpg
012-800x533.jpg
013-800x600-sq.jpg
013-800x600.jpg
014-800x1067-sq.jpg
014-800x1067.jpg
015-800x533-sq.jpg
015-800x533.jpg
016-800x533-sq.jpg
016-800x533.jpg
In ZSH, I want to list all files beginning with any number, not containing dash in filename, so I tried:
print -l <->[^-]*.jpg
with no success. What is wrong with this pattern!?
This is, I think, similar to the case that the documentation for <-> warns about:
Be careful when using other wildcards adjacent to patterns of this form; for example, <0-9>* will actually match any number whatsoever at the start of the string, since the `<0-9>' will match the first
digit, and the `*' will match any others. This is a trap for the unwary, but is in fact an inevitable
consequence of the rule that the longest possible match always succeeds. Expressions such as
`<0-9>[^[:digit:]]*' can be used instead.
In print -l <->[^-]*.jpg, the <-> matches the first digit, then [^-] matches the 2nd digit, and * matches everything thing else.
Use instead
print -l <->[^[:digit:]-]*.jpg
Related
I have a script that was very kindly provided for me a while ago which allowed me to generate input files by inserting coordinates from a series of .xyz files into a template file (Create new files by copying contents of coordinate files into template file).
I'm trying to adapt that script to do something very similar, but different in a very slight, but annoying way. In the script, the new directories created to house these new files are named like this:
# File name is in the form '....Hnnn.xyz';
# this will parse nnn from that name.
local inputNumber=$coordFile
# Remove '.xyz'.
inputNumber=${inputNumber%.xyz}
# Remove everything up to and including the 'H'.
inputNumber=${inputNumber##*H}
# Subdirectory name is based on the input number.
local outDir=$baseDir/D$inputNumber
# Create the directory if it doesn't exist.
if [[ ! -d $outDir ]]; then
mkdir $outDir
fi
This worked for my last problem, because the files were all named in the form xxxx_DH000.xyz. However, now the files I have are named using the form xxxx.000.xyz. While everything else in the script works, I cannot figure out how to name the new directories in the form 000.
The line in the script which I think needs to be edited slightly is where it says inputNumber=${inputNumber##*H}. What I cannot figure out is how to get the script to delete everything up to but not including a 0. I've searched online, but the only questions/answers I've found relating to the renaming of files by stripping part of the original names speaks about deleting everything 'up to and including' a string.
I was able to generate directories named 1, 2, 3, etc. with inputNumber=${inputNumber##*0}, however I want all three digits present (i.e. I would like create directories 001, 002, 003, etc.).
As an aside, I cannot use the . as the cutoff point, as there are multiple .s in each file name. An example of one of the file names is tma.h2s-2-pes-b97m-d4-tz.011.xyz.
Is there some way to get the script to simply name the files based on the full three digit number?
Although it's not needed in this case, zsh does support deleting text just before a matched pattern in a string. These parameter expansions will remove everything prior to the first 0 in the string, but keep the 0:
inputNumber='tma.h2s-2-pes-b97m-d4-tz.011.xyz'
inputNumber=${inputNumber:r} # remove '.xyz'
inputNumber=${(SM)inputNumber##0*}
print ${inputNumber}
# ==> 011
This includes a few zsh-isms:
${...:r} returns the 'root' of a filename, removing the extension.
(S) - parameter expansion flag to change the behavior of the ## expansion. It will now search for patterns in the middle of a string, not just at the beginning.
(M) - flag to include the pattern match (the 0*) in the result.
This depends on the number always starting with 0, which may not be a good choice - what file comes after 099?
This next version uses a zsh extended glob pattern to find a number between two periods, and returns that number - i.e. it will find the number in .11., .011., or .2345., but not in .x11.:
coordFile='tma.h2s-2-pes-b97m-d4-tz.022.xyz'
inputNumber=${(*)coordFile//(#b)*.(<->).*/${match}}
print ${inputNumber}
# ==> 022
Some of the pieces:
${...//.../...} - substitution expansion.
(*) - enables extendedglob for this expansion.
(#b) - globbing flag to enable 'backreferences', so that $match will work.
<-> - matches a number. This can be restricted to a range if needed, like <100-199>.
(<->) - puts the number into a match group.
*. and .* - everything before and after the number; these are not in the match group.
${match} - the matched string from the parenthesized part of the pattern. This is used as the replacement for the entire string, so we get just the number. If more than one part of the input string matches the pattern, this will be the last one. match is actually an array, but since there's only one match group in the pattern, it does not need to be indexed with ${match[1]}.
This variant uses a standard regular expression to find the number:
coordFile='tma.h2s-2-pes-b97m-d4-tz.033.xyz'
match=
[[ $coordFile =~ .*\\.([[:digit:]]+)\\..* ]]
inputNumber=${match[1]}
print ${inputNumber}
# ==> 033
After the [[ ]] test, the match array will contain matches from any parenthesized groups in the regular expression - here, that will be a set of one or more digits in between two periods / full stops.
But, as #choroba and Fravadona have noted, since the number will be always be at the end of the string, you can use the standard #/##/%/%% expansions to remove parts of the string based only on the .s. This is a common idiom that will be familiar to many shell programmers, and will also work in bash (note that other parts of your original script depend on zsh).
inputNumber='tma.h2s-2-pes-b97m-d4-tz.044.xyz'
inputNumber=${inputNumber%.xyz}
inputNumber=${inputNumber##*.}
print ${inputNumber}
# ==> 044
In zsh everything can be consolidated into a single nested substitution:
baseDir='files/are/here'
coordFile='tma.h2s-2-pes-b97m-d4-tz.055.xyz'
local outDir=$baseDir/D${${coordFile:r}##*.}
print $outDir
# ==> files/are/here/D055
My regex pattern looks something like
<xxxx location="file path/level1/level2" xxxx some="xxx">
I am only interested in the part in quotes assigned to location. Shouldn't it be as easy as below without the greedy switch?
/.*location="(.*)".*/
Does not seem to work.
You need to make your regular expression lazy/non-greedy, because by default, "(.*)" will match all of "file path/level1/level2" xxx some="xxx".
Instead you can make your dot-star non-greedy, which will make it match as few characters as possible:
/location="(.*?)"/
Adding a ? on a quantifier (?, * or +) makes it non-greedy.
Note: this is only available in regex engines which implement the Perl 5 extensions (Java, Ruby, Python, etc) but not in "traditional" regex engines (including Awk, sed, grep without -P, etc.).
location="(.*)" will match from the " after location= until the " after some="xxx unless you make it non-greedy.
So you either need .*? (i.e. make it non-greedy by adding ?) or better replace .* with [^"]*.
[^"] Matches any character except for a " <quotation-mark>
More generic: [^abc] - Matches any character except for an a, b or c
How about
.*location="([^"]*)".*
This avoids the unlimited search with .* and will match exactly to the first quote.
Use non-greedy matching, if your engine supports it. Add the ? inside the capture.
/location="(.*?)"/
Use of Lazy quantifiers ? with no global flag is the answer.
Eg,
If you had global flag /g then, it would have matched all the lowest length matches as below.
Here's another way.
Here's the one you want. This is lazy [\s\S]*?
The first item:
[\s\S]*?(?:location="[^"]*")[\s\S]* Replace with: $1
Explaination: https://regex101.com/r/ZcqcUm/2
For completeness, this gets the last one. This is greedy [\s\S]*
The last item:[\s\S]*(?:location="([^"]*)")[\s\S]*
Replace with: $1
Explaination: https://regex101.com/r/LXSPDp/3
There's only 1 difference between these two regular expressions and that is the ?
The other answers here fail to spell out a full solution for regex versions which don't support non-greedy matching. The greedy quantifiers (.*?, .+? etc) are a Perl 5 extension which isn't supported in traditional regular expressions.
If your stopping condition is a single character, the solution is easy; instead of
a(.*?)b
you can match
a[^ab]*b
i.e specify a character class which excludes the starting and ending delimiiters.
In the more general case, you can painstakingly construct an expression like
start(|[^e]|e(|[^n]|n(|[^d])))end
to capture a match between start and the first occurrence of end. Notice how the subexpression with nested parentheses spells out a number of alternatives which between them allow e only if it isn't followed by nd and so forth, and also take care to cover the empty string as one alternative which doesn't match whatever is disallowed at that particular point.
Of course, the correct approach in most cases is to use a proper parser for the format you are trying to parse, but sometimes, maybe one isn't available, or maybe the specialized tool you are using is insisting on a regular expression and nothing else.
Because you are using quantified subpattern and as descried in Perl Doc,
By default, a quantified subpattern is "greedy", that is, it will
match as many times as possible (given a particular starting location)
while still allowing the rest of the pattern to match. If you want it
to match the minimum number of times possible, follow the quantifier
with a "?" . Note that the meanings don't change, just the
"greediness":
*? //Match 0 or more times, not greedily (minimum matches)
+? //Match 1 or more times, not greedily
Thus, to allow your quantified pattern to make minimum match, follow it by ? :
/location="(.*?)"/
import regex
text = 'ask her to call Mary back when she comes back'
p = r'(?i)(?s)call(.*?)back'
for match in regex.finditer(p, str(text)):
print (match.group(1))
Output:
Mary
Trying to figure out how ksh is processing the construct !(text). For example,
$ echo !(hello)
produces a list of files in the current directory (similar to the output of an ls command, except it's sorted into columns rather than rows). It doesn't matter what text is in the parens, the output is the same.
Can anyone enlighten me as to what the command is actually doing? Thanks!
It echoes all files except hello. You can also use wildcards like echo !(*.java)
Here's some more detailed information. For more info, look in the "file name generation" section of the ksh man page (bash works the same way). See here for more patterns: https://www.mkssoftware.com/docs/man1/sh.1.asp
A sub-pattern begins with a ?, *, +, #, or ! character followed by a pattern-list
enclosed in parentheses. Pattern-lists themselves can contain sub-patterns.
The following list describes valid sub-patterns.
?(pattern-list)
Matches exactly zero or exactly one occurrence of the specified pattern-list.
*(pattern-list)
Matches zero or more occurrences of the specified pattern-list.
+(pattern-list)
Matches one or more occurrences of the specified pattern-list.
#(pattern-list)
Matches exactly one occurrence of the specified pattern-list.
!(pattern-list)
Matches any string that does not match the specified pattern-list.
So for your example, when the shell sees the unquoted exclamation point, followed by parenthesis it goes into file name matching mode, then it displays files in the current directory that do not match "hello".
My current regex is only picking up part of my string. It creates a match as soon as one if found, even though I need the longer version of that match to hit. For example, I am creating matches for both:
SSS111
and
SSS111-L
The first SSS111 matches fine with my current regex, but the SSS111-L is only getting matched to the SSS111, leaving the -L out.
How can I create a greedy regex to read the whole line before matching? I am currently using
[-A-Z0-9]{3,12}
to capture the numbers and letters, but have not had any luck outside of this.
Regex are allways greedy. This ist mostly the Problem.
Here i think you have only to escape the '-'
#"[-A-Z]{3-12}"
It seems that the output are the same when I echoed it.
I also tested other commands such as open, but the results from both are the same.
In traditional sh-style pattern matching, * matches zero or more characters in a component of the file name, so there is no difference between *, **, and ***, either on its own or as part of a larger pattern.
However, there are globbing syntaxes that assign a distinct meaning to **. Pattern matching implemented by the Z shell, for example, expands x/**/y to all file names beginning with x/ and ending in /y regardless of how many directories are in between, thus matching all of x/y, x/subdir/y, x/subdir1/subdir2/y, etc. This syntax was later implemented by bash, although only enabled when the globstar configuration option is set by the user.