Regular expression for x number of digits and only one hyphen? - asp.net

I made the following regex:
(\d{5}|\d-\d{4}|\d{2}-\d{3}|\d{3}-\d{2}|\d{4}-\d)
And it seems to work. That is, it will match a 5 digit number or a 5 digit number with only 1 hyphen in it, but the hyphen can not be the lead or the end.
I would like a similar regex, but for a 25 digit number. If I use the same tactic as above, the regex will be very long.
Can anyone suggest a simpler regex?
Additional Notes:
I'm putting this regex into an XML file which is to be consumed by an ASP.NET application. I don't have access to the .net backend code. But I suspect they would do something liek this:
Match match = Regex.Match("Something goes here", "my regex", RegexOptions.None);

You need to use a lookahead:
^(?:\d{25}|(?=\d+-\d+$)[\d\-]{26})$
Explanation:
Either it's \d{25} from start to end, 25 digits.
Or: it is 26 characters of [\d\-] (digits or hyphen) AND it matched \d+-\d+ - meaning it has exactly one hyphen in the middle.
Working example with test cases

You could use this regex:
^[0-9](?:(?=[0-9]*-[0-9]*$)[0-9-]{24}|[0-9]{23})[0-9]$
The lookahead makes sure there's only 1 dash and the character class makes sure there are 23 numbers between the first and the last. Might be made shorter though I think.
EDIT: The a 'bit' shorter xP
^(?:[0-9]{25}|(?=[^-]+-[^-]+$)[0-9-]{26})$
A bit similar to Kobi's though, I admit.

If you aren't fussy about the length at all (i.e. you only want a string of digits with an optional hyphen) you could use:
([\d]+-[\d]+){1}|\d
(You may want to add line/word boundaries to this, depending on your circumstances)
If you need to have a specific length of match, this pattern doesn't really work. Kobi's answer is probably a better fit for you.

I think the fastest way is to do a simple match then add up the length of the capture buffers, why attempt math in a regex, makes no sence.
^(\d+)-?(\d+)$

This will match 25 digits and exactly one hyphen in the middle:
^(?=(-*\d){25})\d.{24}\d$

Related

Split a string in a flexible manner with a regular expression

Context: I need to split strings that are too long and that are used as column headers in an html table. Those strings are variable names, so they don't have any spaces in them.
If I let the css max-width property do the job, the string is split at a fixed place, not making use of the dots or _'s in the string.
For example, suppose I have this string:
this.is.a.long.string.indeed.yeah.well.you.know
Using the dots as separators, I can split it in many, many different ways. But I pose these guiding principles:
All substrings must be 12 characters or less
Separators [._] should be at the end, not at the beginning of a substring
The number of substrings must be minimal
If several solutions exist, the one having the most similar substring lengths is to be preferred.
I could do this programmatically with R, but I'm turning to regex wizards to see whether this is possible using solely regular expressions.
What I have so far:
Regex: .{1,12}(_|\b|\Z)
Results: this.is.a. | long.string. | indeed.yeah. | well.you. | know
It works well, except when there is a long sequence of letters without any separators. Please see this example on regex101.com.
Ideally, separators would be used whenever possible, and a fallback split would occur when there is a sequence longer than 12 characters without a separator.
You were so close, you just need to present it with another alternative for cases where no separator is found:
.{1,12}(_|\b|\Z)|.{1,12}
Check it out: https://regex101.com/r/XrJuYj/2/
Edit: to ensure the split portion contains a non-separating character, you can use the following:
(?=.{1,12}(.*))(?=.*?[^\W_].*?[\W_].*?\1).{1,12}(?<=_|\b|\Z)|.{1,12}
See it at: https://regex101.com/r/XrJuYj/3

RegEx to check the string contains least one alphabet or digit

I want a regular expression that check string must contain least an alphabet [a-zA-Z] or a digit. All other special characters are allowed, but only special characters or only spaces or only spaces with special characters will now be accepted.
I have tried /\b(?=[A-Z]*[0-9])(?=[0-9]*[A-Z])[\s\S]\b/i and ^(a-zA-Z0-9).*[\s\S]*$ and ^(a-zA-Z0-9).*[\s].*[\S]*$ etc. But its not working. Awaiting for your valuable response.
Thanks
^(?=.*[\w\d]).+
This pattern will fail if there is not at least one character or one digit with any combination of special characters and spaces.
I'm not sure I understood you correctly, but from what I've gathered you want to have atleast one letter (a-z, 0-9) in the string. This regex will do just that: /^(?=.*[a-z\d]).+/igm
(Set the flags however they need to be set in asp.net. The m-flag might be redundant for you, I only used it for the demo. The g-flag likely does not exist. If so, just remove it.)
Demo+explanation: http://regex101.com/r/jY9fJ5
If you want at least one alphabet or digit, followed by only spaces and symbols:
/^.*[a-zA-Z0-9][^a-zA-Z0-9]*$/
If you want only one alphabet or digit, followed by the same:
/^[^a-zA-Z0-9]*[a-zA-Z0-9][^a-zA-Z0-9]*$/
I can't imagine what else it is that you are looking for. Examples would help immensely.
(?=.*?[0-9])(?=.*?[A-Za-z]).+
Allows special characters and makes sure at least one number and one letter.
(?=.*?[0-9])(?=.*?[A-Za-z])(?=.*[^0-9A-Za-z]).+
Demands at least one letter, one digit and one special-character.
The first one does not demand special chars, only allows them.

ASP.Net Regular Expression Validator no spaces

I'm trying to set up a validation expression for an ASP.Net Regular Expression Validator control. It is for validating the creation of a user name, so I want to limit the number of characters, and I also want to prevent them from using spaces. Here's what I've got so far:
^.*(?=.{5,20})(?=.*\w{5,255}).*$
The \w{5,255} part prevents spaces and special characters (except for underscores, apparently). I have no idea how "5,255" makes it work, but it does; I just copied it from somewhere else.
The main problem I'm having is that if the first or last character is a space (or special character), it passes validation, which is not acceptable. Can anyone help me? I'm sure it is something simple, but I know next to nothing about regular expressions.
You can use something simpler like this:
^[a-zA-Z0-9_]{5,255}$
This will allow alphanumeric usernames between 5-255 characters in length.
(let's expand overall understanding of how to at least use regex!)
The main reason why the posted regex wasn't working is because you were attempting to use lookahead. Lookahead is a 0-length pattern that just guarantees that the next part of the string will match a certain pattern (and is usually used to take advantage of it being 0-length, so it doesn't expand your capturing group).
Effectively, what your regex (going off of the original /^.(?=.{5,20})(?=.\w{5,255}).*$/) meant was:
^. "The beginning of our line should match any single character (provided it's not a newline, although this depends on the regex implementation as well as flags that may or may not have been passed in)"
(?= "and guarantee that after here"
.{5,20}) are any 5-20 characters."
(?= "Also, after that same first character (since, remember, lookahead is 0-length), guarantee"
. "one arbitrary character"
\w{5,255}) "and 5-255 word characters."
.*$ And of course, since all of that exhaustive matching was 0-length, we want the rest of the line to be an arbitrary number of characters."
What you technically could have done to use lookaround was ^(?=\w{5,255}).{5,255}$, but that's just overly convoluted. I'd suggest just using \w{5,255} or something along those lines.

What is wrong with this Regex "^(.|\s){1,280}$"

Should be validating 1-280 input characters, but it hangs when more than 280 characters are input.
Clarification
I am using the above regex to validate the length of input string to be 280 characters maximum.
I am using asp:RegularExpressionValidator to do that.
There's nothing “wrong” with it per se, but it's horrendous because with most RE engines (you don't say which one you're using) when it doesn't match with the first thing it tries because it causes the engine to backtrack and try loads of different possibilities (none of which can ever cause a match). So it's not a hang, but rather just a machine that's trying to execute around 2280 operations to see if there's a match possible. Excuse me if I don't wait around for that!
Of course, it's theoretically possible for the RE compiler to merge the (.|\s) part of the RE into something it doesn't need to backtrack to deal with. Some RE engines do this (typically the more automata-theoretic ones) but many don't (the stack-based ones).
It is trying every possible combination of . and \s for each character trying to find a version of the pattern that matches the string.
. already matches any character, so (.|\s) is redundant. Further, if you just want to check what the length of the string is, then just do that - why are you pulling out regexes?
If you really want to use a regular expression, you could use .{1, 280}$ combined with the SingleLine option, so that the . metacharacter will match everything, including new lines (see here, Regular Expression API section).

Use a RegularExpressionValidator to limit a word count?

I want to use an ASP.NET RegularExpressionValidator to limit the number of words in a text box. (The RegularExpressionValidator is my favoured solution because it will do both client and server side checks).
So what would be the correct Regex to put in the RegularExpressionValidator that will count the words and enforce a word-limit? For, lets say, 150 words.
(NB: I see that this question is similar, but the answers given seem to also rely on code such as Split() so I don't think any of them could plug into a RegularExpressionValidator which is why I'm asking again)
Since ^ and $ is implicitly set with RegularExpressionValidators, use the following:
(\S*\s*){0,10}
The 0 here allows empty strings (more specifically 0 words) and 150 is the max number of words to accept. Adjust these as necessary to increase/decrease the number of words accepted.
The above regex is non-greedy, so you'll get a quicker match verses the one given in the question you reference. (\b.*\b){0,10} is greedy, so as you increased the number of words you'll see a decrease in performance.
Here is a quick reference for regular expressions:
http://msdn.microsoft.com/en-us/library/az24scfc.aspx
You can use this site to test the expressions:
http://regexpal.com/
Here is my regex example that works with both minimum and maximum word count (and fixes bug with leading spacing):
^\s*(\S+\s+|\S+$){10,150}$
Check this site:
http://lawrence.ecorp.net/inet/samples/regexp-validate.php#count
It's JavaScript RegEx, but it's very similar to asp.net
It's something like this:
(\b[a-z0-9]+\b.*){4,}

Resources