What is the zsh equivalent for $BASH_REMATCH[]? - zsh

What is the equivalent in zsh for $BASH_REMATCH, and how is it used?

Alternatively, one could simply use
$match[1]
in place of
$BASH_REMATCH[1]

To make zsh behave the same as bash, use:
setopt BASH_REMATCH
Or within a function consider:
setopt local_options BASH_REMATCH
(this will only set the option within the scope of the function)
Then just use $BASH_REMATCH as you would in bash.
The manual says about BASH_REMATCH:
When set, matches performed with the =~ operator will set the BASH_REMATCH array variable, instead of the default MATCH and match variables. The first element of the BASH_REMATCH array will contain the entire matched text and subsequent elements will contain extracted substrings. This option makes more sense when KSH_ARRAYS is also set, so that the entire matched portion is stored at index 0 and the first substring is at index 1. Without this option, the MATCH variable contains the entire matched text and the match array variable contains substrings.
Then =~ will behave like in bash, but if you want the full behaviour as described in the manual:
string =~ regexp
true if string matches the regular expression regexp. If the option RE_MATCH_PCRE is set regexp is tested as a PCRE regular expression using the zsh/pcre module, else it is tested as a POSIX extended regular expression using the zsh/regex module. Upon successful match, some variables will be updated; no variables are changed if the matching fails.
If the option BASH_REMATCH is not set the scalar parameter MATCH is set to the substring that matched the pattern and the integer parameters MBEGIN and MEND to the index of the start and end, respectively, of the match in string, such that if string is contained in variable var the expression ‘${var[$MBEGIN,$MEND]}’ is identical to ‘$MATCH’. The setting of the option KSH_ARRAYS is respected. Likewise, the array match is set to the substrings that matched parenthesised subexpressions and the arrays mbegin and mend to the indices of the start and end positions, respectively, of the substrings within string. The arrays are not set if there were no parenthesised subexpresssions. For example, if the string ‘a short string’ is matched against the regular expression ‘s(...)t’, then (assuming the option KSH_ARRAYS is not set) MATCH, MBEGIN and MEND are ‘short’, 3 and 7, respectively, while match, mbegin and mend are single entry arrays containing the strings ‘hor’, ‘4’ and ‘6’, respectively.
If the option BASH_REMATCH is set the array BASH_REMATCH is set to the substring that matched the pattern followed by the substrings that matched parenthesised subexpressions within the pattern.

Related

SQLite3 regexp performance

How performant is the SQLite3 REGEXP operator?
For simplicity, assume a simple table with a single column pattern and an index
CREATE TABLE `foobar` (`pattern` TEXT);
CREATE UNIQUE INDEX `foobar_index` ON `foobar`(`pattern`);
and a query like
SELECT * FROM `foobar` WHERE `pattern` REGEXP 'foo.*'
I have been trying to compare and understand the output from EXPLAIN and it seems to be similar to using LIKE except it will be using regexp for matching. However, I am not fully sure how to read the output from EXPLAIN and I'm not getting a grasp of how performant it will be.
I understand it will be slow compared to a indexed WHERE `pattern` = 'foo' query but is it slower/similar to LIKE?
sqlite does not optimize WHERE ... REGEXP ... to use indexes. x REGEXP y is simply a function call; it's equivalent to regexp(x,y). Also note that not all installations of sqlite have a regexp function defined so using it (or the REGEXP operator) is not very portable. LIKE/GLOB on the other hand can take advantage of indexes for prefix queries provided that some additional conditions are met:
The right-hand side of the LIKE or GLOB must be either a string literal or a parameter bound to a string literal that does not begin with a wildcard character.
It must not be possible to make the LIKE or GLOB operator true by having a numeric value (instead of a string or blob) on the left-hand side. This means that either:
the left-hand side of the LIKE or GLOB operator is the name of an indexed column with TEXT affinity, or
the right-hand side pattern argument does not begin with a minus sign ("-") or a digit.
This constraint arises from the fact that numbers do not sort in lexicographical order. For example: 9<10 but '9'>'10'.
The built-in functions used to implement LIKE and GLOB must not have been overloaded using the sqlite3_create_function() API.
For the GLOB operator, the column must be indexed using the built-in BINARY collating sequence.
For the LIKE operator, if case_sensitive_like mode is enabled then the column must indexed using BINARY collating sequence, or if case_sensitive_like mode is disabled then the column must indexed using built-in NOCASE collating sequence.
If the ESCAPE option is used, the ESCAPE character must be ASCII, or a single-byte character in UTF-8.

Regular expression for excluding some specific characters

I am trying to build a regular expression in Qt for the following set of strings:
The set can contain all the set of strings of length 1 which does not include r and z.
The set also includes the set of strings of length greater than 1, which start with z, followed by any number of z's but must terminate with a single character that is not r and z
So far I have developed the following:
[a-qs-y]?|z+[a-qs-y]
But it does not work.
The question mark in your regular expression causes the first alternative to either match lowercase strings of length 1 excluding r and z or the empty string, and as the empty string can be matched within any string, the second alternative will never be matched against. The rest of your regular expression matches your specification, although you will probably want to make your regular expression only match entire strings by anchoring it:
QRegularExpression re("^[a-qs-y]$|^z+[a-qs-y]$");
QRegularExpressionMatch match = re.match("zzza");
if (match.hasMatch()) {
QString matched = match.captured(0);
// ...
}

What is meaning of ##*/ in unix?

I found syntax like below.
${VARIABLE##*/}
what is the meaning of ##*/ in this?
I know meaning of */ in ls */ but not aware about what above syntax does.
This example will make it clear:
VARIABLE='abcd/def/123'
echo "${VARIABLE#*/}"
def/123
echo "${VARIABLE##*/}"
123
##*/ is stripping out longest match of anything followed by / from start of input.
#*/ is stripping out shortest match of anything followed by / from start of input.
PS: Using all capital variable names is not considered very good practice in Unix shell. Better to use variable instead of VARIABLE.
From man bash:
${parameter#word}
${parameter##word}
Remove matching prefix pattern. The word is expanded to produce
a pattern just as in pathname expansion. If the pattern matches
the beginning of the value of parameter, then the result of the
expansion is the expanded value of parameter with the shortest
matching pattern (the ``#'' case) or the longest matching pat‐
tern (the ``##'' case) deleted. If parameter is # or *, the
pattern removal operation is applied to each positional parame‐
ter in turn, and the expansion is the resultant list. If param‐
eter is an array variable subscripted with # or *, the pattern
removal operation is applied to each member of the array in
turn, and the expansion is the resultant list.

Pyparsing: the differences between MatchFirst, Or, and oneOf

in Pyparsing, what are the differences between MatchFirst, Or, and oneOf
when there are shared characters in the strings like
word, wording, words
Or(['word', 'wording', 'words'])
MatchFirst(['word', 'wording', 'words'])
oneOf(['word', 'wording', 'words'])
From the online docs (https://pythonhosted.org/pyparsing/)
MatchFirst - If two expressions match, the first one listed is the one that will match.
Or - If two expressions match, the expression that matches the longest string will be used.
oneOf - Helper to quickly define a set of alternative Literals, and makes sure to do longest-first testing when there is a conflict, regardless of the input order, but returns a MatchFirst for best performance.
MatchFirst tests the current parse location with each string in its constructor, stopping at the first one to match.
Or tests the current parse location against all of the strings given in its constructor, and will return the longest match.
oneOf generates a Regex or MatchFirst to match the longest match, by reordering the input list when there are alternatives with common start strings to test the longer string first.
oneOf operates on str understood as space separated strings and can be simplistically defined as
oneOf = lambda xs: Or(Literal(x) for x in xs.split(" "))
While Or operates on expressions - ParseElement instances.
So you can see either oneOf as specialization of Or or Or being a generalization of oneOf.
You can write oneOf('foo bar') as Literal('foo') ^ Literal('bar')
but you can't write every Or expression using oneOf.
MatchFirst is the same as Or except conflict resolution method - Or yields the longest match while MatchFirst returns the first match in definition order.
So
expr = Literal('bar') ^ Words(alphanums)
expr.parseString("barstool").asList() == ["barstool"]
but
expr = Literal('bar') | Words(alphanums)
expr.parseString("barstool").asList() == ["bar"]

How to use glob and find brace brackets in sqlite?

I want to use sqlite3 query like this:
select * from Log where Desc glob '*[ _.,:;!?-(){}[]<>''"]OK';
to find records which ends with OK, like
OK
asdasda _OK
asda (OK
dasda [OK
dasda ]OK
but this fails me when i use back bracket in query...glob '*[ []]OK';
Any suggestions?
A comment hidden in the source code says:
Globbing rules:
* Matches any sequence of zero or more characters.
? Matches exactly one character.
[...] Matches one character from the enclosed list of characters.
[^...] Matches one character not in the enclosed list.
With the [...] and [^...] matching, a ] character can be included
in the list by making it the first character after [ or ^. A
range of characters can be specified using -. Example:
[a-z] matches any single lower-case letter. To match a -, make
it the last character in the list.
So, your records can be found with ... glob '*[] _.,:;!?(){}[<>''"-]OK'.

Resources