regular expression validate if 5th or 6th character "_" - asp.net

I am facing the problem in asp.net regular expression.
I need to validate if 5th or 6th character is "-" .
for example
3000-4567, 3000-4568 this string is , separated and also has a hyphen. I just need to check if each comma separated string has 5th or 6th character as a "-".
Current regular expression used in the system is
^((\s*\d{4,4}\s*[,]){1,3}?)?(\s*\d{4,4})*$
currently its validating 3000,4567

I've made two slight changes to your regex:
'^((\s*\d{4,5}\s*[/-]){1,3}?)?(\s*\d{4,4})*$'
Changed the cardinality of the first numeric group to {4,5} to allow for 5 digits numbers (which I guess is what you want since the dash can be the sixth character) and changed the separator to a dash. Notice the slash to escape it, since in square brackets the dash is a special character (tho' you probably don't need the brackets there).
As an alternative, consider splitting the string on instances of - and then validating the splitted chunks. That should be much easier.

Related

Remove 1 character from both sides of a recurring substring where one digit changes in each occurrence

I need to remove apostrophes from both sides of a sub-string. The substring occurs numerous times within a starting string, and one digit changes within the substring for each occurrence.
starting_string = "{'color':'Highcharts.getOptions().colors[0]','color':'Highcharts.getOptions().colors[1]','color':'Highcharts.getOptions().colors[2]'}"
substring = Highcharts.getOptions().colors[i]
desired_string = "{'color':Highcharts.getOptions().colors[0],'color':Highcharts.getOptions().colors[1],'color':Highcharts.getOptions().colors[2]}"
Above, in 'substring', 'i' represents the digit that changes in each occurrence of the substring.
The number of times 'substring' occurs in 'starting_string' will vary. This example is simplified.
gsub("'(Highcharts\\.getOptions\\(\\)\\.colors\\[[0-9]+\\])'",
"\\1", starting_string)
# [1] "{'color':Highcharts.getOptions().colors[0],'color':Highcharts.getOptions().colors[1],'color':Highcharts.getOptions().colors[2]}"
Explanation of the regex:
the parens (Hig...) define a group that we'll reference later using \\1;
the enveloping ' are the literal single quotes; note that these are outside the paren-group, as we will want to drop them once we find them;
I took the liberty of inferring that i means "any number", so I replaced it with [0-9]+ which means "one or more digit".
many characters have special meaning in regex, so they are backslash-escaped; here, they are (, ), [, ], and .. For the record, I might have been able to omit all of the backslashes and used instead fixed=TRUE, except that we want to be able to match on arbitrary numbers in [i].

How to split a string by dashes outside of square brackets

I would like to split strings like the following:
x <- "abc-1230-xyz-[def-ghu-jkl---]-[adsasa7asda12]-s-[klas-bst-asdas foo]"
by dash (-) on the condition that those dashes must not be contained inside a pair of []. The expected result would be
c("abc", "1230", "xyz", "[def-ghu-jkl---]", "[adsasa7asda12]", "s",
"[klas-bst-asdas foo]")
Notes:
There is no nesting of square brackets inside each other.
The square brackets can contain any characters / numbers / symbols except square brackets.
The other parts of the string are also variable so that we can only assume that we split by - whenever it's not inside [].
There's a similar question for python (How to split a string by commas positioned outside of parenthesis?) but I haven't yet been able to accurately adjust that to my scenario.
You could use look ahead to verify that there is no ] following sooner than a [:
-(?![^[]*\])
So in R:
strsplit(x, "-(?![^[]*\\])", perl=TRUE)
Explanation:
-: match the hyphen
(?! ): negative look ahead: if that part is found after the previously matched hyphen, it invalidates the match of the hyphen.
[^[]: match any character that is not a [
*: match any number of the previous
\]: match a literal ]. If this matches, it means we found a ] before finding a [. As all this happens in a negative look ahead, a match here means the hyphen is not a match. Note that a ] is a special character in regular expressions, so it must be escaped with a backslash (although it does work without escape, as the engine knows there is no matching [ preceding it -- but I prefer to be clear about it being a literal). And as backslashes have a special meaning in string literals (they also denote an escape), that backslash itself must be escaped again in this string, so it appears as \\].
Instead of splitting, extract the parts:
library(stringr)
str_extract_all(x, "(\\[[^\\[]*\\]|[^-])+")
I am not familiar with r language, but I believe it can do regex based search and replace. Instead of struggling with one single regex split function, I would go in 3 steps:
replace - in all [....] parts by a invisible char, like \x99
split by -
for each element in the above split result(array/list), replace \x99 back to -
For the first step, you can find the parts by \[[^]]

How to extract characters from a string based on the text surrounding them in R

Edited to highlight the language I'm using I'm using the R language and I have many large lists of character strings and they have a similar format. I am interested in the characters directly in front of a series of characters that is consistently in the string, but not in a consistent place within the string. For instance:
a <- "aabbccddeeff"
b <- "aabbddff"
c <- "aabbffgghhii"
d <- "bbffgghhii"
I am interested in extracting the two characters directly preceding the "ff" in each character string. I can't find any reasonable solution apart from breaking each character string down using grepl() and then processing them each independently, which seems like an inefficient way to do it.
You can match those two characters and capture them with sub and the right regular expression.
Strings = c("aabbccddeeff",
"aabbddff",
"aabbffgghhii",
"bbffgghhii")
sub(".*(\\w\\w)ff.*", "\\1", Strings)
[1] "ee" "dd" "bb" "bb"
Explanation, This replaces the entire string with the two characters before the "ff". If there are multiple "ff" in the string, this expression takes the two characters before the last "ff".
How this works: The three arguments to sub are:
1. a pattern to search for
2. What it will be replaced with
3. The strings to apply it to.
Most of the work is in the pattern part - .*(\\w\\w)ff.*. The ff part of the pattern must be obvious. We are targeting things near the specific string ff. What comes right before it is (\\w\\w). \w refers to a "word character". That means any letter a-z or A-Z, any digit 0-9 or the one other character _. We want two characters so we have \\w\\w. By enclosing \\w\\w in parentheses, it turns this pattern of two characters into a "capture group", a string that will be saved into a variable for later use. Since this is the first (and only) capture group in this expression, those two characters will be stored in a variable called \1. Now we want only those two characters so in order to blow away everything before and after we put .* at the front and back. . matches any character and * means do this zero or more times, so .* means zero or more copies of any character. Now we have broken the string into four parts: "ff", the two characters before "ff", everything before that and everything after the ff. This covers the entire string. sub will _replace the part that was matched (everything) with whatever it says in the substitution pattern, in this case "\1". That is just how you write a string that evaluates to \1, the name of the variable where we stored the two characters that we want. We write it that way because backslash "escapes" whatever is after it. We actually want the character \ so we write \ to indicate \ and \1 evaluates to \1. So everything in the string is replaced by the targeted two characters. We apply this to every string in the list of strings Strings.

regex to allow alphanumeric characters on both sides of an equal sign?

I need a regular expression to validate whether text entered in an asp.net textbox has the following format
A-za-z123456789 /s = /s A-za-z123456789
Regular expression explained:
one or more alphanumeric characters
followed by any number of spaces
an equal sign
followed by any number of spaces
one or more alphanumeric characters
[a-zA-Z0-9]*\s*\=\s*[a-zA-Z0-9]*
Replace * with + if you want one or more rather than "any" (which includes zero)
Considering your answer to the comment about requiring one or more alphanumeric characters each side:
[a-zA-Z0-9]+\s*\=\s*[a-zA-Z0-9]+
This version will only match if there is at least one alphanumeric character each side of the "=".
If zero valid
"^[a-zA-Z\\d]+\\s*=\\s*[a-zA-Z\\d]+$"
If zero not valid
"^[a-zA-Z1-9]+\\s*=\\s*[a-zA-Z1-9]+$"

Regular expression to match maximium of five words

I have a regular expression
^[a-zA-Z+#-.0-9]{1,5}$
which validates that the word contains alpha-numeric characters and few special characters and length should not be more than 5 characters.
How do I make this regular expression to accept a maximum of five words matching the above regular expression.
^[a-zA-Z+#\-.0-9]{1,5}(\s[a-zA-Z+#\-.0-9]{1,5}){0,4}$
Also, you could use for example [ ] instead of \s if you just want to accept space, not tab and newline. And you could write [ ]+ (or \s+) for any number of spaces (or whitespaces), not just one.
Edit: Removed the invalid solution and fixed the bug mentioned by unicornaddict.
I believe this may be what you're looking for. It forces at least one word of your desired pattern, then zero to four of the same, each preceded by one or more white-space characters:
^XX(\s+XX){0,4}$
where XX is your actual one-word regex.
It's separated into two distinct sections so that you're not required to have white-space at the end of the string. If you want to allow for such white-space, simply add \s* at that point. For example, allowing white-space both at start and end would be:
^\s*XX(\s+XX){0,4}\s*$
You regex has a small bug. It matches letters, digits, +, #, period but not hyphen and also all char between # and period. This is because hyphen in a char class when surrounded on both sides acts as a range meta char. To avoid this you'll have to escape the hyphen:
^[a-zA-Z+#\-.0-9]{1,5}$
Or put it at the beg/end of the char class, so that its treated literally:
^[-a-zA-Z+#-.0-9]{1,5}$
^[a-zA-Z+#.0-9-]{1,5}$
Now to match a max of 5 such words you can use:
^(?:[a-zA-Z+#\-.0-9]{1,5}\s+){1,5}$
EDIT: This solution has a severe limitation of matching only those input that end in white space!!! To overcome this limitation you can see the ans by Jakob.

Resources