Regular expression for excluding some specific characters - qt

I am trying to build a regular expression in Qt for the following set of strings:
The set can contain all the set of strings of length 1 which does not include r and z.
The set also includes the set of strings of length greater than 1, which start with z, followed by any number of z's but must terminate with a single character that is not r and z
So far I have developed the following:
[a-qs-y]?|z+[a-qs-y]
But it does not work.

The question mark in your regular expression causes the first alternative to either match lowercase strings of length 1 excluding r and z or the empty string, and as the empty string can be matched within any string, the second alternative will never be matched against. The rest of your regular expression matches your specification, although you will probably want to make your regular expression only match entire strings by anchoring it:
QRegularExpression re("^[a-qs-y]$|^z+[a-qs-y]$");
QRegularExpressionMatch match = re.match("zzza");
if (match.hasMatch()) {
QString matched = match.captured(0);
// ...
}

Related

Pattern match with R

I am trying to match a pattern using rgep() function as below -
grep("XYZ31__Sheqwqet1__CSV.csv", "^(XYZ)+[0-9]{2}[a-zA-Z_]+(csv)+$")
However unfortunately above expression results in no match. Any pointer towards the right direction will be very helpful.
Thanks for your time
Before the csv there is also a . and some digits. In addition, the order of arguments is pattern, followed by the input x. (if we pass arguments via name, the order wouldn't matter though)
grep( "^(XYZ)+[0-9]{2}[[:alnum:]_.]+(csv)$", "XYZ31__Sheqwqet1__CSV.csv")
#[1] 1
Pattern match is
^- start of the string
(XYZ)+ - one or more occurence of those letters
[0-9]{2} - two digits
[[:alnum:]_.]+ - one or more alpha numeric characters including the additional two
(csv)$- csv at the end of the string

How to use variable character strings with 'substitute' function in R

I need to have the possibility to fill an expression with the values of the unknown number of variables. The shape of the expression depends on the number of the variables.
Example:
Expression1: "italic(y)==a*italic(x)*b"
to become: "y=1.2 x+4.3"
Expression2: "italic(y)==a*italic(x)*b~c"
to become: "y=1.2 x+4.3 -5.3"
Currently I am using the substitute function, but it does not work along with the expression function:
substitute(expression("italic(y)==a*italic(x)*b"),list(a=1.23,b=2.3))
My expression needs to grow as the number of variables (i.e. length of the list) increases. So, next step would be to add the variable c:
substitute(expression("space1*italic(y)==a*italic(x)*b*c"),list(a=1.23,b=2.3,c=3.2))
But I need to change the expression in the code without any manual interference and these codes do not read the variable values from the list unless I change it to this (in which the expression is not expandable anymore as it is not a string):
substitute(italic(y)==a*italic(x)*b*c,list(a=1.23,b=2.3,c=3.2))
How can I do this?
Here is a script which might be along the lines of what you want. We can iterate the list of replacements using a for loop, and then make a regex replacement of the placeholder in the expression with the corresponding value from the list.
lst <- list(a=1.23,b=2.3)
expression <- "italic(y)==a*italic(x)*b"
for (name in names(lst)) {
expression <- gsub(paste0("\\b", name, "\\b"), lst[[name]], expression)
}
print(expression)
[1] "italic(y)==1.23*italic(x)*2.3"
Note carefully that I search for the variable name surrounded by word boundaries on both sides. If your placeholder would ever be surrounded by other word characters, then my solution would fail, and we would need to change the replacement logic.

What is the zsh equivalent for $BASH_REMATCH[]?

What is the equivalent in zsh for $BASH_REMATCH, and how is it used?
Alternatively, one could simply use
$match[1]
in place of
$BASH_REMATCH[1]
To make zsh behave the same as bash, use:
setopt BASH_REMATCH
Or within a function consider:
setopt local_options BASH_REMATCH
(this will only set the option within the scope of the function)
Then just use $BASH_REMATCH as you would in bash.
The manual says about BASH_REMATCH:
When set, matches performed with the =~ operator will set the BASH_REMATCH array variable, instead of the default MATCH and match variables. The first element of the BASH_REMATCH array will contain the entire matched text and subsequent elements will contain extracted substrings. This option makes more sense when KSH_ARRAYS is also set, so that the entire matched portion is stored at index 0 and the first substring is at index 1. Without this option, the MATCH variable contains the entire matched text and the match array variable contains substrings.
Then =~ will behave like in bash, but if you want the full behaviour as described in the manual:
string =~ regexp
true if string matches the regular expression regexp. If the option RE_MATCH_PCRE is set regexp is tested as a PCRE regular expression using the zsh/pcre module, else it is tested as a POSIX extended regular expression using the zsh/regex module. Upon successful match, some variables will be updated; no variables are changed if the matching fails.
If the option BASH_REMATCH is not set the scalar parameter MATCH is set to the substring that matched the pattern and the integer parameters MBEGIN and MEND to the index of the start and end, respectively, of the match in string, such that if string is contained in variable var the expression ‘${var[$MBEGIN,$MEND]}’ is identical to ‘$MATCH’. The setting of the option KSH_ARRAYS is respected. Likewise, the array match is set to the substrings that matched parenthesised subexpressions and the arrays mbegin and mend to the indices of the start and end positions, respectively, of the substrings within string. The arrays are not set if there were no parenthesised subexpresssions. For example, if the string ‘a short string’ is matched against the regular expression ‘s(...)t’, then (assuming the option KSH_ARRAYS is not set) MATCH, MBEGIN and MEND are ‘short’, 3 and 7, respectively, while match, mbegin and mend are single entry arrays containing the strings ‘hor’, ‘4’ and ‘6’, respectively.
If the option BASH_REMATCH is set the array BASH_REMATCH is set to the substring that matched the pattern followed by the substrings that matched parenthesised subexpressions within the pattern.

How to split a string by dashes outside of square brackets

I would like to split strings like the following:
x <- "abc-1230-xyz-[def-ghu-jkl---]-[adsasa7asda12]-s-[klas-bst-asdas foo]"
by dash (-) on the condition that those dashes must not be contained inside a pair of []. The expected result would be
c("abc", "1230", "xyz", "[def-ghu-jkl---]", "[adsasa7asda12]", "s",
"[klas-bst-asdas foo]")
Notes:
There is no nesting of square brackets inside each other.
The square brackets can contain any characters / numbers / symbols except square brackets.
The other parts of the string are also variable so that we can only assume that we split by - whenever it's not inside [].
There's a similar question for python (How to split a string by commas positioned outside of parenthesis?) but I haven't yet been able to accurately adjust that to my scenario.
You could use look ahead to verify that there is no ] following sooner than a [:
-(?![^[]*\])
So in R:
strsplit(x, "-(?![^[]*\\])", perl=TRUE)
Explanation:
-: match the hyphen
(?! ): negative look ahead: if that part is found after the previously matched hyphen, it invalidates the match of the hyphen.
[^[]: match any character that is not a [
*: match any number of the previous
\]: match a literal ]. If this matches, it means we found a ] before finding a [. As all this happens in a negative look ahead, a match here means the hyphen is not a match. Note that a ] is a special character in regular expressions, so it must be escaped with a backslash (although it does work without escape, as the engine knows there is no matching [ preceding it -- but I prefer to be clear about it being a literal). And as backslashes have a special meaning in string literals (they also denote an escape), that backslash itself must be escaped again in this string, so it appears as \\].
Instead of splitting, extract the parts:
library(stringr)
str_extract_all(x, "(\\[[^\\[]*\\]|[^-])+")
I am not familiar with r language, but I believe it can do regex based search and replace. Instead of struggling with one single regex split function, I would go in 3 steps:
replace - in all [....] parts by a invisible char, like \x99
split by -
for each element in the above split result(array/list), replace \x99 back to -
For the first step, you can find the parts by \[[^]]

How do I write a regular expression that will match if the 6th character of a string is one of two different letters?

I'm trying to write a validator for an ASP.NET txtbox.
How can I validate so the regular expression will only match if the 6th character is a "C" or a "P"?
^.{5}[CP] will match strings starting with any five characters and then a C or P.
Depending on exactly what you want, you are looking for something like:
^.{5}[CP]
The ^ says to start from the beginning of the string, the . defines any character, the {5} says that the . must match 5 times, then the [CP] says the next character must be part of the character class CP - i.e. either a C or a P.
^.{5}[CP] -- the trick is the {}, they match a certain number of characters.
^.{5}[CP] has a few important pieces:
^ = from the beginning
. = match anything
{5} = make the previous match the number of times in braces
[CP] = match any one of the specific items in brackets
so the regex spoken would be something like "from the beginning of the string, match anything five times, then match a 'C' or 'P'"
[a-zA-Z0-9]{5}[CP] will match any five characters or digits and then a C or P.

Resources