Difference between (.|[\r\n]){1,1500} and ^.{1,1500}$ - asp.net

What is the difference between below two regular expressions
(.|[\r\n]){1,1500}
^.{1,1500}$

The first matches up-to-1500 chars, and the second (assuming you haven't set certain regex options) matches a first single line of up-to-1500 chars, with no newlines.

. does not match new lines.

The second one matches the first 1500 characteres of a line IF the line contains 1500 characters or less

First expression matches some <= 1500 characters of the file(or other source).
Second expression matches a entire line with charsNumber <= 1500.
. matches any character except \n newline.

If it's for use in a RegularExpressionValidator, you probably want to use this regex:
^[\s\S]{1,1500}$
This is because the regex may be run on either the server (.NET) or the client (JavaScript). In .NET regexes you can use the RegexOptions.Singleline flag (or its inline equivalent, (?s)) to make the dot match newlines, but JavaScript has no such mechanism.
[\s\S] matches any whitespace character or anything that's not a whitespace character--in other words, anything. It's the most popular idiom for matching anything including a newline in JavaScript; it's much, much more efficient than alternation-based approaches like (.|\n).
Note that you'll still need to use a RequiredFieldValidator if you don't want the user to leave the textbox empty.

Related

Extract mm/dd/yyyy and m/dd/yyyy dates from string in R [duplicate]

My regex pattern looks something like
<xxxx location="file path/level1/level2" xxxx some="xxx">
I am only interested in the part in quotes assigned to location. Shouldn't it be as easy as below without the greedy switch?
/.*location="(.*)".*/
Does not seem to work.
You need to make your regular expression lazy/non-greedy, because by default, "(.*)" will match all of "file path/level1/level2" xxx some="xxx".
Instead you can make your dot-star non-greedy, which will make it match as few characters as possible:
/location="(.*?)"/
Adding a ? on a quantifier (?, * or +) makes it non-greedy.
Note: this is only available in regex engines which implement the Perl 5 extensions (Java, Ruby, Python, etc) but not in "traditional" regex engines (including Awk, sed, grep without -P, etc.).
location="(.*)" will match from the " after location= until the " after some="xxx unless you make it non-greedy.
So you either need .*? (i.e. make it non-greedy by adding ?) or better replace .* with [^"]*.
[^"] Matches any character except for a " <quotation-mark>
More generic: [^abc] - Matches any character except for an a, b or c
How about
.*location="([^"]*)".*
This avoids the unlimited search with .* and will match exactly to the first quote.
Use non-greedy matching, if your engine supports it. Add the ? inside the capture.
/location="(.*?)"/
Use of Lazy quantifiers ? with no global flag is the answer.
Eg,
If you had global flag /g then, it would have matched all the lowest length matches as below.
Here's another way.
Here's the one you want. This is lazy [\s\S]*?
The first item:
[\s\S]*?(?:location="[^"]*")[\s\S]* Replace with: $1
Explaination: https://regex101.com/r/ZcqcUm/2
For completeness, this gets the last one. This is greedy [\s\S]*
The last item:[\s\S]*(?:location="([^"]*)")[\s\S]*
Replace with: $1
Explaination: https://regex101.com/r/LXSPDp/3
There's only 1 difference between these two regular expressions and that is the ?
The other answers here fail to spell out a full solution for regex versions which don't support non-greedy matching. The greedy quantifiers (.*?, .+? etc) are a Perl 5 extension which isn't supported in traditional regular expressions.
If your stopping condition is a single character, the solution is easy; instead of
a(.*?)b
you can match
a[^ab]*b
i.e specify a character class which excludes the starting and ending delimiiters.
In the more general case, you can painstakingly construct an expression like
start(|[^e]|e(|[^n]|n(|[^d])))end
to capture a match between start and the first occurrence of end. Notice how the subexpression with nested parentheses spells out a number of alternatives which between them allow e only if it isn't followed by nd and so forth, and also take care to cover the empty string as one alternative which doesn't match whatever is disallowed at that particular point.
Of course, the correct approach in most cases is to use a proper parser for the format you are trying to parse, but sometimes, maybe one isn't available, or maybe the specialized tool you are using is insisting on a regular expression and nothing else.
Because you are using quantified subpattern and as descried in Perl Doc,
By default, a quantified subpattern is "greedy", that is, it will
match as many times as possible (given a particular starting location)
while still allowing the rest of the pattern to match. If you want it
to match the minimum number of times possible, follow the quantifier
with a "?" . Note that the meanings don't change, just the
"greediness":
*? //Match 0 or more times, not greedily (minimum matches)
+? //Match 1 or more times, not greedily
Thus, to allow your quantified pattern to make minimum match, follow it by ? :
/location="(.*?)"/
import regex
text = 'ask her to call Mary back when she comes back'
p = r'(?i)(?s)call(.*?)back'
for match in regex.finditer(p, str(text)):
print (match.group(1))
Output:
Mary

Regular expression to allow only one space between words

I've a textbox in an ASP.NET application, for which I need to use a regular expression to validate the user input string. Requirements for regex are -
It should allow only one space between words. That is, total number of spaces between words or characters should only be one.
It should ignore leading and trailing spaces.
Matches:
Test
Test abc
Non Matches:
Test abc def
Test abc --> I wanted to include multiple spaces between the 2 words. However the editor ignores these extra spaces while posting a question.
Assuming there must be either one or two 'words' (i.e. sequences of non-space characters)
"\s*\S+(\s\S+)?\s*"
Change \S to [A-Za-z] if you want to allow only letters.
Pretty straightforward:
/^ *(\w+ ?)+ *$/
Fiddle: http://refiddle.com/gls
Maybe this one will do?
\s*\S+?\s?\S*\s*
Edit: Its a server-encoded regex, meaning that you might need to remove one of those escaping slashes.
How about:
^\s*(\w+\s)*\w+\s*$

ASP.NET regular expression to restrict consecutive characters

Using ASP.NET syntax for the RegularExpressionValidator control, how do you specify restriction of two consecutive characters, say character 'x'?
You can provide a regex like the following:
(\\w)\\1+
(\\w) will match any word character, and \\1+ will match whatever character was matched with (\\w).
I do not have access to asp.net at the moment, but take this console app as an example:
Console.WriteLine(regex.IsMatch("hello") ? "Not valid" : "Valid"); // Hello contains to consecutive l:s, hence not valid
Console.WriteLine(regex.IsMatch("Bar") ? "Not valid" : "Valid"); // Bar does not contain any consecutive characters, so it's valid
Alexn is right, this is the way you match consecutive characters with a regex, i.e. (a)\1 matches aa.
However, I think this is a case of everything looking like a nail when you're holding a hammer. I would not use regex to validate this input. Rather, I suggest validating this in code (just looping through the string, comparing str[i] and str[i-1], checking for this condition).
This should work:
^((?<char>\w)(?!\k<char>))*$
It matches abc, but not abbc.
The key is to use so called "zero-width negative lookahead assertion" (syntax: (?! subexpression)).
Here we make sure that a group matched with (?<char>\w) is not followed by itself (expressed with (?!\k<char>)).
Note that \w can be replaced with any valid set of characters (\w does not match white-spaces characters).
You can also do it without named group (note that the referenced group has number 2):
^((\w)(?!\2))*$
And its important to start with ^ and end with $ to match the whole text.
If you want to only exclude text with consecutive x characters, you may use this
^((?<char>x)(?!\k<char>)|[^x\W])*$
or without backreferences
^(x(?!x)|[^x\W])*$
All syntax elements for .NET Framework Regular Expressions are explained here.
You can use a regex to validate what's wrong as well as what's right of course. The regex (.)\1 will match any two consecutive characters, so you can just reject any input that gives an IsValid result to that. If this is the only validation you need, I think this way is far easier than trying to come up with a regex to validate correct input instead.

Regex: Match opening/closing chars with spaces

I'm trying to complete a regular expression that will pull out matches based on their opening and closing characters, the closest I've gotten is
^(\[\[)[a-zA-Z.-_]+(\]\])
Which will match a string such as "[[word1]]" and bring me back all the matches if there is more than one, The problem is I want it to pick up matchs where there may be a space in so for example "[[word1 word2]]", now this will work if I add a space into my pattern above however this pops up a problem that it will only get one match for my entire string so for example if I have a string
"Hi [[Title]] [[Name]] [[surname]], How are you"
then the match will be [[Title]] [[Name]] [[surname]] rather than 3 matches [[Title]], [[Name]], [[surname]]. I'm sure I'm just a char or two away in the Regex but I'm stuck, How can I make it return the 3 matches.
Thanks
You just need to make you regex non-greedy by using a ? like:
^(\[\[)[a-zA-Z.-_ ]+?(\]\])
Also there is a bug in your regex. You've included - in the char class thinking of it as a literal hyphen. But - in a char class is a meta char. So it effectively will match all char between . (period) and _ (underscore). So you need to escape it as:
^(\[\[)[a-zA-Z.\-_ ]+?(\]\])
or you can put is in some other place in the regex so that it will not have things on both sides of it as:
^(\[\[)[a-zA-Z._ -]+?(\]\])
or
^(\[\[)[-a-zA-Z._ ]+?(\]\])
You need to turn off greedy matching. See these examples for different languages:
asp.net
java
javascript
You should use +? instead of +.
The one without the question mark will try to match as much as possible, while the one with the question mark as little as possible.
Another approach would be to use [^\]] as your characters instead of [a-zA-Z.-_]. That way, a match will never extend over your closing brackets.

Don't want spaces in the text, but this regex is passing not sure why

I am using the following regex
/[a-zA-Z0-9]+/i.test(value)
If I enter a space in the word, it passes.
I don't see where spaces are aloud in the regex, why is it passing?
You need to set the beginning and end bounderies so that the entire string must match the regular expression, otherwise it'll look for any match (which in this case is one or more of the characters specified).
Try this:
/^[a-zA-Z0-9]+$/i.test(value)
Because you haven't anchored it.
For these sorts of tests, it's typically safer to make sure you don't have the negated character class:
/[^a-zA-Z0-9]/

Resources