I want to use an ASP.NET RegularExpressionValidator to limit the number of words in a text box. (The RegularExpressionValidator is my favoured solution because it will do both client and server side checks).
So what would be the correct Regex to put in the RegularExpressionValidator that will count the words and enforce a word-limit? For, lets say, 150 words.
(NB: I see that this question is similar, but the answers given seem to also rely on code such as Split() so I don't think any of them could plug into a RegularExpressionValidator which is why I'm asking again)
Since ^ and $ is implicitly set with RegularExpressionValidators, use the following:
(\S*\s*){0,10}
The 0 here allows empty strings (more specifically 0 words) and 150 is the max number of words to accept. Adjust these as necessary to increase/decrease the number of words accepted.
The above regex is non-greedy, so you'll get a quicker match verses the one given in the question you reference. (\b.*\b){0,10} is greedy, so as you increased the number of words you'll see a decrease in performance.
Here is a quick reference for regular expressions:
http://msdn.microsoft.com/en-us/library/az24scfc.aspx
You can use this site to test the expressions:
http://regexpal.com/
Here is my regex example that works with both minimum and maximum word count (and fixes bug with leading spacing):
^\s*(\S+\s+|\S+$){10,150}$
Check this site:
http://lawrence.ecorp.net/inet/samples/regexp-validate.php#count
It's JavaScript RegEx, but it's very similar to asp.net
It's something like this:
(\b[a-z0-9]+\b.*){4,}
Related
I made the following regex:
(\d{5}|\d-\d{4}|\d{2}-\d{3}|\d{3}-\d{2}|\d{4}-\d)
And it seems to work. That is, it will match a 5 digit number or a 5 digit number with only 1 hyphen in it, but the hyphen can not be the lead or the end.
I would like a similar regex, but for a 25 digit number. If I use the same tactic as above, the regex will be very long.
Can anyone suggest a simpler regex?
Additional Notes:
I'm putting this regex into an XML file which is to be consumed by an ASP.NET application. I don't have access to the .net backend code. But I suspect they would do something liek this:
Match match = Regex.Match("Something goes here", "my regex", RegexOptions.None);
You need to use a lookahead:
^(?:\d{25}|(?=\d+-\d+$)[\d\-]{26})$
Explanation:
Either it's \d{25} from start to end, 25 digits.
Or: it is 26 characters of [\d\-] (digits or hyphen) AND it matched \d+-\d+ - meaning it has exactly one hyphen in the middle.
Working example with test cases
You could use this regex:
^[0-9](?:(?=[0-9]*-[0-9]*$)[0-9-]{24}|[0-9]{23})[0-9]$
The lookahead makes sure there's only 1 dash and the character class makes sure there are 23 numbers between the first and the last. Might be made shorter though I think.
EDIT: The a 'bit' shorter xP
^(?:[0-9]{25}|(?=[^-]+-[^-]+$)[0-9-]{26})$
A bit similar to Kobi's though, I admit.
If you aren't fussy about the length at all (i.e. you only want a string of digits with an optional hyphen) you could use:
([\d]+-[\d]+){1}|\d
(You may want to add line/word boundaries to this, depending on your circumstances)
If you need to have a specific length of match, this pattern doesn't really work. Kobi's answer is probably a better fit for you.
I think the fastest way is to do a simple match then add up the length of the capture buffers, why attempt math in a regex, makes no sence.
^(\d+)-?(\d+)$
This will match 25 digits and exactly one hyphen in the middle:
^(?=(-*\d){25})\d.{24}\d$
I am using the following regex with a .net validator.
^100|150|200|250|300|350|400|450|500|550|600|650|700|750|800|850|900|950|1000$
The aim is to allow 1 of the values in the list.
However, whilst it works great with most, inputting '1000' produces an error.
Any ideas?
You need to limit the scope of your alternation:
^(100|150|200|250|300|350|400|450|500|550|600|650|700|750|800|850|900|950|1000)$
And of course you can optimize your regex:
^([1-9][05]0|1000)$
I'm trying to set up a validation expression for an ASP.Net Regular Expression Validator control. It is for validating the creation of a user name, so I want to limit the number of characters, and I also want to prevent them from using spaces. Here's what I've got so far:
^.*(?=.{5,20})(?=.*\w{5,255}).*$
The \w{5,255} part prevents spaces and special characters (except for underscores, apparently). I have no idea how "5,255" makes it work, but it does; I just copied it from somewhere else.
The main problem I'm having is that if the first or last character is a space (or special character), it passes validation, which is not acceptable. Can anyone help me? I'm sure it is something simple, but I know next to nothing about regular expressions.
You can use something simpler like this:
^[a-zA-Z0-9_]{5,255}$
This will allow alphanumeric usernames between 5-255 characters in length.
(let's expand overall understanding of how to at least use regex!)
The main reason why the posted regex wasn't working is because you were attempting to use lookahead. Lookahead is a 0-length pattern that just guarantees that the next part of the string will match a certain pattern (and is usually used to take advantage of it being 0-length, so it doesn't expand your capturing group).
Effectively, what your regex (going off of the original /^.(?=.{5,20})(?=.\w{5,255}).*$/) meant was:
^. "The beginning of our line should match any single character (provided it's not a newline, although this depends on the regex implementation as well as flags that may or may not have been passed in)"
(?= "and guarantee that after here"
.{5,20}) are any 5-20 characters."
(?= "Also, after that same first character (since, remember, lookahead is 0-length), guarantee"
. "one arbitrary character"
\w{5,255}) "and 5-255 word characters."
.*$ And of course, since all of that exhaustive matching was 0-length, we want the rest of the line to be an arbitrary number of characters."
What you technically could have done to use lookaround was ^(?=\w{5,255}).{5,255}$, but that's just overly convoluted. I'd suggest just using \w{5,255} or something along those lines.
Should be validating 1-280 input characters, but it hangs when more than 280 characters are input.
Clarification
I am using the above regex to validate the length of input string to be 280 characters maximum.
I am using asp:RegularExpressionValidator to do that.
There's nothing “wrong” with it per se, but it's horrendous because with most RE engines (you don't say which one you're using) when it doesn't match with the first thing it tries because it causes the engine to backtrack and try loads of different possibilities (none of which can ever cause a match). So it's not a hang, but rather just a machine that's trying to execute around 2280 operations to see if there's a match possible. Excuse me if I don't wait around for that!
Of course, it's theoretically possible for the RE compiler to merge the (.|\s) part of the RE into something it doesn't need to backtrack to deal with. Some RE engines do this (typically the more automata-theoretic ones) but many don't (the stack-based ones).
It is trying every possible combination of . and \s for each character trying to find a version of the pattern that matches the string.
. already matches any character, so (.|\s) is redundant. Further, if you just want to check what the length of the string is, then just do that - why are you pulling out regexes?
If you really want to use a regular expression, you could use .{1, 280}$ combined with the SingleLine option, so that the . metacharacter will match everything, including new lines (see here, Regular Expression API section).
I'd like to find patterns and sort them by number of occurrences on an HEX file I have.
I am not looking for some specific pattern, just to make some statistics of the occurrences happening there and sort them.
DB0DDAEEDAF7DAF5DB1FDB1DDB20DB1BDAFCDAFBDB1FDB18DB23DB06DB21DB15DB25DB1DDB2EDB36DB43DB59DB32DB28DB2ADB46DB6FDB32DB44DB40DB50DB87DBB0DBA1DBABDBA0DB9ADBA6DBACDBA0DB96DB95DBB7DBCFDBCBDBD6DB9CDBB5DB9DDB9FDBA3DB88DB89DB93DBA5DB9CDBC1DBC1DBC6DBC3DBC9DBB3DBB8DBB6DBC8DBA8DBB6DBA2DB98DBA9DBB9DBDBDBD5DBD9DBC3DB9BDBA2DB84DB83DB7DDB6BDB58DB4EDB42DB16DB0DDB01DB02DAFCDAE9DAE5DAD9DAE2DAB7DA9BDAA6DA9EDAAADAC9DACADAC4DA92DA90DA84DA89DA93DAA9DA8CDA7FDA62DA53DA6EDA
That's an excerpt of the HEX file, and as an example I'd like to get:
XX occurrences of BDBDBD
XX occurrences of B93D
Is there a way to mine the file to generate that output?
Sure. Use a sliding window to create the counts (The link is for Perl, but it seems general enough to understand the algorithm). Your patterns are named N-grams. You will have to limit the maximal pattern, though.
This is a pretty classic CS problem. The code in general is non-trivial to implement as it will require at least one full parse of the sequence, and depending on your efficiency and memory/processor constraints might require several. See here.
You will need to partition your input string in some way to ensure that you get a good subsequence across it.
If there is a specific problem we might be able to help more, but the general strategy is in the Wikipedia article above.
You can use Regular Expressions to make a pattern to search for.
The regex needed would be very simple. Just use the exact phrase you're searching for. Then there should be a regular expression function in the language you're using (you didn't specify) that can count the number of matches.
Use that to create a simple counter.