IIS UrlRewrite RegEx differences with .NET RegEx - asp.net

I have a valid RegEx pattern in .NET:
(?>.*param1=value1.*)(?<!.*param2=\d+.*) which matches if:
query string contains param1=value1
but does not contain param2= a number
It works in .NET. However IIS URLRewrite complains that it is not a valid pattern.
Can I not use zero-width negative look behind (?<! ) expressions with IIS URLRewrite?
Note that I tried to apply this pattern both in web.config (properly changing < and > to < and > respectively, as well as in the IIS Manager - all without success.

IIS URLRewrite default regex syntax is ECMAScript, that is not compatible with .NET regex syntax. See URL Rewrite Module Configuration Reference:
ECMAScript – Perl compatible (ECMAScript standard compliant) regular expression syntax. This is a default option for any rule.
You cannot use a lookbehind at all, you will have to rely on lookaheads only:
^(?!.*param2=\d).*param1=value1.*
Pattern explanation:
^ - start of string
(?!.*param2=\d) - if there is param2= followed with a digit (\d) after 0+ characters other than a newline fails the match (return no match)
.*param1=value1.* - match a whole line that contains param1=value1
You can enhance this rule by adding \b around param1=value1 to only match it as a whole word.

Related

Extract entire string following specific characters & trouble with str_extract() [duplicate]

For example, this regex
(.*)<FooBar>
will match:
abcde<FooBar>
But how do I get it to match across multiple lines?
abcde
fghij<FooBar>
Try this:
((.|\n)*)<FooBar>
It basically says "any character or a newline" repeated zero or more times.
It depends on the language, but there should be a modifier that you can add to the regex pattern. In PHP it is:
/(.*)<FooBar>/s
The s at the end causes the dot to match all characters including newlines.
The question is, can the . pattern match any character? The answer varies from engine to engine. The main difference is whether the pattern is used by a POSIX or non-POSIX regex library.
A special note about lua-patterns: they are not considered regular expressions, but . matches any character there, the same as POSIX-based engines.
Another note on matlab and octave: the . matches any character by default (demo): str = "abcde\n fghij<Foobar>"; expression = '(.*)<Foobar>*'; [tokens,matches] = regexp(str,expression,'tokens','match'); (tokens contain a abcde\n fghij item).
Also, in all of boost's regex grammars the dot matches line breaks by default. Boost's ECMAScript grammar allows you to turn this off with regex_constants::no_mod_m (source).
As for oracle (it is POSIX based), use the n option (demo): select regexp_substr('abcde' || chr(10) ||' fghij<Foobar>', '(.*)<Foobar>', 1, 1, 'n', 1) as results from dual
POSIX-based engines:
A mere . already matches line breaks, so there isn't a need to use any modifiers, see bash (demo).
The tcl (demo), postgresql (demo), r (TRE, base R default engine with no perl=TRUE, for base R with perl=TRUE or for stringr/stringi patterns, use the (?s) inline modifier) (demo) also treat . the same way.
However, most POSIX-based tools process input line by line. Hence, . does not match the line breaks just because they are not in scope. Here are some examples how to override this:
sed - There are multiple workarounds. The most precise, but not very safe, is sed 'H;1h;$!d;x; s/\(.*\)><Foobar>/\1/' (H;1h;$!d;x; slurps the file into memory). If whole lines must be included, sed '/start_pattern/,/end_pattern/d' file (removing from start will end with matched lines included) or sed '/start_pattern/,/end_pattern/{{//!d;};}' file (with matching lines excluded) can be considered.
perl - perl -0pe 's/(.*)<FooBar>/$1/gs' <<< "$str" (-0 slurps the whole file into memory, -p prints the file after applying the script given by -e). Note that using -000pe will slurp the file and activate 'paragraph mode' where Perl uses consecutive newlines (\n\n) as the record separator.
gnu-grep - grep -Poz '(?si)abc\K.*?(?=<Foobar>)' file. Here, z enables file slurping, (?s) enables the DOTALL mode for the . pattern, (?i) enables case insensitive mode, \K omits the text matched so far, *? is a lazy quantifier, (?=<Foobar>) matches the location before <Foobar>.
pcregrep - pcregrep -Mi "(?si)abc\K.*?(?=<Foobar>)" file (M enables file slurping here). Note pcregrep is a good solution for macOS grep users.
See demos.
Non-POSIX-based engines:
php - Use the s modifier PCRE_DOTALL modifier: preg_match('~(.*)<Foobar>~s', $s, $m) (demo)
c# - Use RegexOptions.Singleline flag (demo): - var result = Regex.Match(s, #"(.*)<Foobar>", RegexOptions.Singleline).Groups[1].Value;- var result = Regex.Match(s, #"(?s)(.*)<Foobar>").Groups[1].Value;
powershell - Use the (?s) inline option: $s = "abcde`nfghij<FooBar>"; $s -match "(?s)(.*)<Foobar>"; $matches[1]
perl - Use the s modifier (or (?s) inline version at the start) (demo): /(.*)<FooBar>/s
python - Use the re.DOTALL (or re.S) flags or (?s) inline modifier (demo): m = re.search(r"(.*)<FooBar>", s, flags=re.S) (and then if m:, print(m.group(1)))
java - Use Pattern.DOTALL modifier (or inline (?s) flag) (demo): Pattern.compile("(.*)<FooBar>", Pattern.DOTALL)
kotlin - Use RegexOption.DOT_MATCHES_ALL : "(.*)<FooBar>".toRegex(RegexOption.DOT_MATCHES_ALL)
groovy - Use (?s) in-pattern modifier (demo): regex = /(?s)(.*)<FooBar>/
scala - Use (?s) modifier (demo): "(?s)(.*)<Foobar>".r.findAllIn("abcde\n fghij<Foobar>").matchData foreach { m => println(m.group(1)) }
javascript - Use [^] or workarounds [\d\D] / [\w\W] / [\s\S] (demo): s.match(/([\s\S]*)<FooBar>/)[1]
c++ (std::regex) Use [\s\S] or the JavaScript workarounds (demo): regex rex(R"(([\s\S]*)<FooBar>)");
vba vbscript - Use the same approach as in JavaScript, ([\s\S]*)<Foobar>. (NOTE: The MultiLine property of the RegExp object is sometimes erroneously thought to be the option to allow . match across line breaks, while, in fact, it only changes the ^ and $ behavior to match start/end of lines rather than strings, the same as in JavaScript regex)
behavior.)
ruby - Use the /m MULTILINE modifier (demo): s[/(.*)<Foobar>/m, 1]
rtrebase-r - Base R PCRE regexps - use (?s): regmatches(x, regexec("(?s)(.*)<FooBar>",x, perl=TRUE))[[1]][2] (demo)
ricustringrstringi - in stringr/stringi regex funtions that are powered with the ICU regex engine. Also use (?s): stringr::str_match(x, "(?s)(.*)<FooBar>")[,2] (demo)
go - Use the inline modifier (?s) at the start (demo): re: = regexp.MustCompile(`(?s)(.*)<FooBar>`)
swift - Use dotMatchesLineSeparators or (easier) pass the (?s) inline modifier to the pattern: let rx = "(?s)(.*)<Foobar>"
objective-c - The same as Swift. (?s) works the easiest, but here is how the option can be used: NSRegularExpression* regex = [NSRegularExpression regularExpressionWithPattern:pattern options:NSRegularExpressionDotMatchesLineSeparators error:&regexError];
re2, google-apps-script - Use the (?s) modifier (demo): "(?s)(.*)<Foobar>" (in Google Spreadsheets, =REGEXEXTRACT(A2,"(?s)(.*)<Foobar>"))
NOTES ON (?s):
In most non-POSIX engines, the (?s) inline modifier (or embedded flag option) can be used to enforce . to match line breaks.
If placed at the start of the pattern, (?s) changes the bahavior of all . in the pattern. If the (?s) is placed somewhere after the beginning, only those .s will be affected that are located to the right of it unless this is a pattern passed to Python's re. In Python re, regardless of the (?s) location, the whole pattern . is affected. The (?s) effect is stopped using (?-s). A modified group can be used to only affect a specified range of a regex pattern (e.g., Delim1(?s:.*?)\nDelim2.* will make the first .*? match across newlines and the second .* will only match the rest of the line).
POSIX note:
In non-POSIX regex engines, to match any character, [\s\S] / [\d\D] / [\w\W] constructs can be used.
In POSIX, [\s\S] is not matching any character (as in JavaScript or any non-POSIX engine), because regex escape sequences are not supported inside bracket expressions. [\s\S] is parsed as bracket expressions that match a single character, \ or s or S.
If you're using Eclipse search, you can enable the "DOTALL" option to make '.' match any character including line delimiters: just add "(?s)" at the beginning of your search string. Example:
(?s).*<FooBar>
In many regex dialects, /[\S\s]*<Foobar>/ will do just what you want. Source
([\s\S]*)<FooBar>
The dot matches all except newlines (\r\n). So use \s\S, which will match ALL characters.
We can also use
(.*?\n)*?
to match everything including newline without being greedy.
This will make the new line optional
(.*?|\n)*?
In Ruby you can use the 'm' option (multiline):
/YOUR_REGEXP/m
See the Regexp documentation on ruby-doc.org for more information.
"." normally doesn't match line-breaks. Most regex engines allows you to add the S-flag (also called DOTALL and SINGLELINE) to make "." also match newlines.
If that fails, you could do something like [\S\s].
For Eclipse, the following expression worked:
Foo
jadajada Bar"
Regular expression:
Foo[\S\s]{1,10}.*Bar*
Note that (.|\n)* can be less efficient than (for example) [\s\S]* (if your language's regexes support such escapes) and than finding how to specify the modifier that makes . also match newlines. Or you can go with POSIXy alternatives like [[:space:][:^space:]]*.
Use:
/(.*)<FooBar>/s
The s causes dot (.) to match carriage returns.
Use RegexOptions.Singleline. It changes the meaning of . to include newlines.
Regex.Replace(content, searchText, replaceText, RegexOptions.Singleline);
In notepad++ you can use this
<table (.|\r\n)*</table>
It will match the entire table starting from
rows and columns
You can make it greedy, using the following, that way it will match the first, second and so forth tables and not all at once
<table (.|\r\n)*?</table>
In a Java-based regular expression, you can use [\s\S].
This works for me and is the simplest one:
(\X*)<FooBar>
Generally, . doesn't match newlines, so try ((.|\n)*)<foobar>.
In JavaScript you can use [^]* to search for zero to infinite characters, including line breaks.
$("#find_and_replace").click(function() {
var text = $("#textarea").val();
search_term = new RegExp("[^]*<Foobar>", "gi");;
replace_term = "Replacement term";
var new_text = text.replace(search_term, replace_term);
$("#textarea").val(new_text);
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<button id="find_and_replace">Find and replace</button>
<br>
<textarea ID="textarea">abcde
fghij<Foobar></textarea>
Solution:
Use pattern modifier sU will get the desired matching in PHP.
Example:
preg_match('/(.*)/sU', $content, $match);
Sources:
Pattern Modifiers
In the context of use within languages, regular expressions act on strings, not lines. So you should be able to use the regex normally, assuming that the input string has multiple lines.
In this case, the given regex will match the entire string, since "<FooBar>" is present. Depending on the specifics of the regex implementation, the $1 value (obtained from the "(.*)") will either be "fghij" or "abcde\nfghij". As others have said, some implementations allow you to control whether the "." will match the newline, giving you the choice.
Line-based regular expression use is usually for command line things like egrep.
Try: .*\n*.*<FooBar> assuming you are also allowing blank newlines. As you are allowing any character including nothing before <FooBar>.
I had the same problem and solved it in probably not the best way but it works. I replaced all line breaks before I did my real match:
mystring = Regex.Replace(mystring, "\r\n", "")
I am manipulating HTML so line breaks don't really matter to me in this case.
I tried all of the suggestions above with no luck. I am using .NET 3.5 FYI.
I wanted to match a particular if block in Java:
...
...
if(isTrue){
doAction();
}
...
...
}
If I use the regExp
if \(isTrue(.|\n)*}
it included the closing brace for the method block, so I used
if \(!isTrue([^}.]|\n)*}
to exclude the closing brace from the wildcard match.
Often we have to modify a substring with a few keywords spread across lines preceding the substring. Consider an XML element:
<TASK>
<UID>21</UID>
<Name>Architectural design</Name>
<PercentComplete>81</PercentComplete>
</TASK>
Suppose we want to modify the 81, to some other value, say 40. First identify .UID.21..UID., then skip all characters including \n till .PercentCompleted.. The regular expression pattern and the replace specification are:
String hw = new String("<TASK>\n <UID>21</UID>\n <Name>Architectural design</Name>\n <PercentComplete>81</PercentComplete>\n</TASK>");
String pattern = new String ("(<UID>21</UID>)((.|\n)*?)(<PercentComplete>)(\\d+)(</PercentComplete>)");
String replaceSpec = new String ("$1$2$440$6");
// Note that the group (<PercentComplete>) is $4 and the group ((.|\n)*?) is $2.
String iw = hw.replaceFirst(pattern, replaceSpec);
System.out.println(iw);
<TASK>
<UID>21</UID>
<Name>Architectural design</Name>
<PercentComplete>40</PercentComplete>
</TASK>
The subgroup (.|\n) is probably the missing group $3. If we make it non-capturing by (?:.|\n) then the $3 is (<PercentComplete>). So the pattern and replaceSpec can also be:
pattern = new String("(<UID>21</UID>)((?:.|\n)*?)(<PercentComplete>)(\\d+)(</PercentComplete>)");
replaceSpec = new String("$1$2$340$5")
and the replacement works correctly as before.
Typically searching for three consecutive lines in PowerShell, it would look like:
$file = Get-Content file.txt -raw
$pattern = 'lineone\r\nlinetwo\r\nlinethree\r\n' # "Windows" text
$pattern = 'lineone\nlinetwo\nlinethree\n' # "Unix" text
$pattern = 'lineone\r?\nlinetwo\r?\nlinethree\r?\n' # Both
$file -match $pattern
# output
True
Bizarrely, this would be Unix text at the prompt, but Windows text in a file:
$pattern = 'lineone
linetwo
linethree
'
Here's a way to print out the line endings:
'lineone
linetwo
linethree
' -replace "`r",'\r' -replace "`n",'\n'
# Output
lineone\nlinetwo\nlinethree\n
Option 1
One way would be to use the s flag (just like the accepted answer):
/(.*)<FooBar>/s
Demo 1
Option 2
A second way would be to use the m (multiline) flag and any of the following patterns:
/([\s\S]*)<FooBar>/m
or
/([\d\D]*)<FooBar>/m
or
/([\w\W]*)<FooBar>/m
Demo 2
RegEx Circuit
jex.im visualizes regular expressions:

Why is my url rewriting rule not working properly?

I'm trying to write this redirection
images/catalog/1002/10002/main-200x250.12345.jpg to url images/catalog/1002/10002/main.jpg?w=200&h=250&vw=main
I tried this rule:
rewrite "^/images/(.*)/([a-z0-9]+)-([0-9])x([0-9]).([0-9]{5}).(jpg|jpeg|png|gif|ico)$" /images/$1/$2.$6?w=$3&h=$4&vw=$2 break;
It is not working, it return 404 not found error. I don't know what I'm missing.
Also when I remove double quotes (") I got this error
directive "rewrite" is not terminated by ";"
And I don't clear see the utility of the sign " and when should I use it or avoid it
I m working on a Mac with MAMP Pro v 5.2.2
You forgot to add a quantifier for the width and height numbers in your regex. Try this (I added a twice a +, you might want to use {X} instead, where X is the amount of digits for each number (if it is always the same amount of digits)):
rewrite "^/images/(.*)/([a-z0-9]+)-([0-9]+)x([0-9]+).([0-9]{5}).(jpg|jpeg|png|gif|ico)$" /images/$1/$2.$6?w=$3&h=$4&vw=$2 break;
Your reqular expression needs to be quoted because there is a } in it.
I think the nginx documentation about rewrite directive will answer your question, when a regular expression needs to be quoted:
If a regular expression includes the “}” or “;” characters, the whole
expressions should be enclosed in single or double quotes.

Why does searching for .* not work in UltraEdit?

In UltraEdit I enabled UNIX-style regular expressions, but finding .* does not work; only .+ will find something.
Why, and how can I make it work?
I should add that I am working with UltraEdit 11.10b. Is there a known bug or something?
. matches any character except carriage return and line-feed.
* matches preceding expression 0 or more times, but non greedy.
Non greedy means as less characters as possible to get a positive result for the expression.
The expression .* makes sense only between two fixed strings. You cannot use just .* as nothing matches is a positive match for this expression, too. Any character except line terminators zero times is enough for a positive result for this expression and therefore using just .* always matches nothing. Or in other words: nothing found is a positive result for the regular expression .*.
Also word.* and .*word are not useful as with those expressions just word is found or you get unpredictable results.
With .* or .+ within a search string the find engine always needs a fixed string before and after, or a non matching anchor like ^ or $ to know where to start selecting any character except line terminators 0 respectively 1 or more times.
By the way: The Unix regular expression engine of UE v11.10b is just the UltraEdit regular expression engine used with syntax of Perl regular expression. This explains also why the Unix regular expression engine supports only what the UltraEdit regular expression engine supports, just with a different set of special characters. You should think about an upgrade of UltraEdit to current version 21.10 which has the real Perl regular expression engine in latest version included with all the powerful capabilities of this regular expression engine.

How to restrict some file types and allow all others in Regular expression Asp.net

I want to restrict few file types format and allow all others in regular expression validation expression. What i have try is specify some allowed and some restricted file types but i want to specify restricted file types and allow all others type to be uploaded.
I am using regular expression with asp.net file upload control. My regular expression looks like this right now
^.*\.(csv|xlsx|xls|doc|docx|pdf|txt|zip|(?!exe)|(?!bat)|(?!msi))$
Its working fine and restricting exe bat and msi but i want to allow all other file formats
Use a negative lookbehind:
/^.*(?<!\.(exe|bat|msi))$/i
The negative lookaheads you're using aren't helping you. At all. They're trivially true because you're trying to match them at the end of the string, and lookarounds don't consume anything, so the last position in the string can't have exe or bat after it.
A step by step explanation, for posterity's sake:
^
Match the start of the string, as I'm sure you know.
.*
consume the whole string.
(?<! ... )
Look back and make sure we haven't consumed....
\.
A literal dot, followed by...
(exe|bat|msi)
any of our verbotten file types.
$
then match the end of the string.
I also chose to make it case insensitive.
Edit, for js:
/^(?:(?!\.(exe|bat|msi)$).)*/i
Moar different explanation:
^
Top of string
(
Start group
.
Arbitrary Character
(?!...)
Negative lookahead. Not followed by:
\.
Literal dot.
(exe|bat|msi)
Forbidden File types.
$
End of string
)*
Close group and match that an arbitrary number of times.

How to escape a RegEx that generates an error of “unexpected quantifier"on IE?

I use asp.net and C#. I have TextBox with an Validation Control with RegEx.
I use this code as validation.
ValidationExpression="^(?s)(.){4,128}$"
But only in IE9 I receive an error: unexpected quantifier from the javascript section.
Probably I have to escape my RegEx but I do not have any idea how to do it and what to escape.
Could you help me with a sample of code? Thanks
Write it like this instead :
^([\s\S]){4,128}$
I suspect that (?s) is the cause of the error.
Three problems: (1) JavaScript doesn't support inline modifiers like (?s), (2) there's no other way to pass modifiers in an ASP validator, and (3) neither of those facts matters, because JavaScript doesn't support single-line mode. Most people use [\s\S] to match anything-including-newlines in JavaScript regexes.
EDIT: Here's how it would look in your case:
ValidationExpression="^[\s\S]{4,128}$"
[\s\S] is a character class that matches any whitespace character (\s) or any character that's not a whitespace character--in other words, any character. The dot (.) metacharacter matches any character except a newline. Most regex flavors (like .NET's) support a "Singleline" or "DOTALL" mode that makes the dot match newlines, too, but not JavaScript.
JavaScript doesn't understand (?s) afaik, instead you can replace . with [^] or [\s\S].
Eg: ^[^]{4,128}$

Resources