What does the "(?i)" mean in Regex? - asp.net

Regex pattern (?i)(?<=<data name=")\w+(?=") can capture test of
<data name="test" xml:space="preserve">
<value>123</value>
</data>
But what does the "(?i)" mean in regex?

It's a way of specifying that the matching should be case insensitive.
Here's the MSDN page on Regex options:
By applying inline options in a regular expression pattern with the syntax (?imnsx-imnsx). The option applies to the pattern from the point that the option is defined to either the end of the pattern or to the point at which the option is undefined by another inline option.
But really, it looks like you're processing XML, in which case, you should really be using an XML parser, not regular expressions. There are classes built into the framework for working with XML which properly respect all of the rules of XML. Treating XML as "just a string" tends to lead to brittle solutions.

Related

Regex for date pattern with optionnal year [duplicate]

This question already has answers here:
Regular expression to stop at first match
(9 answers)
Closed 2 years ago.
I have this gigantic ugly string:
J0000000: Transaction A0001401 started on 8/22/2008 9:49:29 AM
J0000010: Project name: E:\foo.pf
J0000011: Job name: MBiek Direct Mail Test
J0000020: Document 1 - Completed successfully
I'm trying to extract pieces from it using regex. In this case, I want to grab everything after Project Name up to the part where it says J0000011: (the 11 is going to be a different number every time).
Here's the regex I've been playing with:
Project name:\s+(.*)\s+J[0-9]{7}:
The problem is that it doesn't stop until it hits the J0000020: at the end.
How do I make the regex stop at the first occurrence of J[0-9]{7}?
Make .* non-greedy by adding '?' after it:
Project name:\s+(.*?)\s+J[0-9]{7}:
Using non-greedy quantifiers here is probably the best solution, also because it is more efficient than the greedy alternative: Greedy matches generally go as far as they can (here, until the end of the text!) and then trace back character after character to try and match the part coming afterwards.
However, consider using a negative character class instead:
Project name:\s+(\S*)\s+J[0-9]{7}:
\S means “everything except a whitespace and this is exactly what you want.
Well, ".*" is a greedy selector. You make it non-greedy by using ".*?" When using the latter construct, the regex engine will, at every step it matches text into the "." attempt to match whatever make come after the ".*?". This means that if for instance nothing comes after the ".*?", then it matches nothing.
Here's what I used. s contains your original string. This code is .NET specific, but most flavors of regex will have something similar.
string m = Regex.Match(s, #"Project name: (?<name>.*?) J\d+").Groups["name"].Value;
I would also recommend you experiment with regular expressions using "Expresso" - it's a utility a great (and free) utility for regex editing and testing.
One of its upsides is that its UI exposes a lot of regex functionality that people unexprienced with regex might not be familiar with, in a way that it would be easy for them to learn these new concepts.
For example, when building your regex using the UI, and choosing "*", you have the ability to check the checkbox "As few as possible" and see the resulting regex, as well as test its behavior, even if you were unfamiliar with non-greedy expressions before.
Available for download at their site:
http://www.ultrapico.com/Expresso.htm
Express download:
http://www.ultrapico.com/ExpressoDownload.htm
(Project name:\s+[A-Z]:(?:\\w+)+.[a-zA-Z]+\s+J[0-9]{7})(?=:)
This will work for you.
Adding (?:\\w+)+.[a-zA-Z]+ will be more restrictive instead of .*

Encode only spaces in jmeter GUI

I read a csv file for input in my jmeter test plan. I name the first variable in the row query.
I need it to encode spaces as %20 not +. Using the __urlencode() function like ${__urlencode(${query})} encodes the spaces as + the same way selecting the encode option on the parameter does in the above screenshot.
I don't think this is something you're really want as encoding the URL is not only about spaces.
You should use encodeURIComponent() function (or its equivalent). The way of calling it in JMeter via __javaScript function will look like:
${__javaScript(encodeURIComponent("${query}"),)}
If you just need to replace spaces with %20 you can do it with __groovy() funciton like:
${__groovy(vars.get('query').replaceAll(' '\, '%20'),)}
Demo:
See Apache JMeter Functions - An Introduction article for more information on JMeter Functions concept.

extracting from text blocks with stringr [duplicate]

This question already has answers here:
Regular expression to stop at first match
(9 answers)
Closed 2 years ago.
I have this gigantic ugly string:
J0000000: Transaction A0001401 started on 8/22/2008 9:49:29 AM
J0000010: Project name: E:\foo.pf
J0000011: Job name: MBiek Direct Mail Test
J0000020: Document 1 - Completed successfully
I'm trying to extract pieces from it using regex. In this case, I want to grab everything after Project Name up to the part where it says J0000011: (the 11 is going to be a different number every time).
Here's the regex I've been playing with:
Project name:\s+(.*)\s+J[0-9]{7}:
The problem is that it doesn't stop until it hits the J0000020: at the end.
How do I make the regex stop at the first occurrence of J[0-9]{7}?
Make .* non-greedy by adding '?' after it:
Project name:\s+(.*?)\s+J[0-9]{7}:
Using non-greedy quantifiers here is probably the best solution, also because it is more efficient than the greedy alternative: Greedy matches generally go as far as they can (here, until the end of the text!) and then trace back character after character to try and match the part coming afterwards.
However, consider using a negative character class instead:
Project name:\s+(\S*)\s+J[0-9]{7}:
\S means “everything except a whitespace and this is exactly what you want.
Well, ".*" is a greedy selector. You make it non-greedy by using ".*?" When using the latter construct, the regex engine will, at every step it matches text into the "." attempt to match whatever make come after the ".*?". This means that if for instance nothing comes after the ".*?", then it matches nothing.
Here's what I used. s contains your original string. This code is .NET specific, but most flavors of regex will have something similar.
string m = Regex.Match(s, #"Project name: (?<name>.*?) J\d+").Groups["name"].Value;
I would also recommend you experiment with regular expressions using "Expresso" - it's a utility a great (and free) utility for regex editing and testing.
One of its upsides is that its UI exposes a lot of regex functionality that people unexprienced with regex might not be familiar with, in a way that it would be easy for them to learn these new concepts.
For example, when building your regex using the UI, and choosing "*", you have the ability to check the checkbox "As few as possible" and see the resulting regex, as well as test its behavior, even if you were unfamiliar with non-greedy expressions before.
Available for download at their site:
http://www.ultrapico.com/Expresso.htm
Express download:
http://www.ultrapico.com/ExpressoDownload.htm
(Project name:\s+[A-Z]:(?:\\w+)+.[a-zA-Z]+\s+J[0-9]{7})(?=:)
This will work for you.
Adding (?:\\w+)+.[a-zA-Z]+ will be more restrictive instead of .*

What is an example usage of <url-modifier> at CSS url() function?

3.4. Resource Locators: the <url> type describes a <url-modifier> at
A URL is a pointer to a resource and is a functional notation
denoted by <url>. The syntax of a <url> is:
<url> = url( <string> <url-modifier>* )
In addition to the syntax defined above, a can sometimes be
written in other ways:
For legacy reasons, a <url> can be written without quotation marks around the URL itself. This syntax is specially-parsed, and
produces a <url-token> rather than a function syntactically.
[CSS3SYN]
Some CSS contexts, such as #import, allow a <url> to be represented by a <string> instead. This behaves identically to
writing a url() function containing that string. Because these
alternate ways of writing a <url> are not functional notations, they
cannot accept any <url-modifier>s.
Note: The special parsing rules for the legacy quotation mark-less
<url> syntax means that parentheses, whitespace characters, single
quotes (') and double quotes (") appearing in a URL must be escaped
with a backslash, e.g. url(open\(parens), url(close\)parens).
Depending on the type of URL, it might also be possible to write these
characters as URL-escapes (e.g. url(open%28parens) or
url(close%29parens)) as described in[URL]. (If written as a
normal function containing a string, ordinary string escaping rules
apply; only newlines and the character used to quote the string need
to be escaped.)
at
3.4.2. URL Modifiers
The url() function supports specifying additional <url-modifier>s,
which change the meaning or the interpretation of the URL somehow. A
<url-modifier> is either an <ident> or a function.
This specification does not define any <url-modifier>s, but other
specs may do so.
See also CSS Values and Units Module Level 3
Editor’s Draft, 21 March 2016
What are example usages of <ident> and function at url() ?
What are differences between <string> , <ident>, function at url() ?
A <url-modifier> is either an <ident> or a function.
<ident> is an identifier.
A portion of the CSS source that has the same syntax as an <ident-token>.
<ident-token> Syntax ;
I could not find any examples of <ident> used within the url function but
as mentioned in this email there are some possible future uses.
Fetch options to control CORS/cookies/etc
working with Subresource Integrity
Looking at the <ident> syntax you cannot use a key/value pair so i assume
most of this would be implemented using a function which does not yet exist., resource hinting could be implemented using <ident>.
.foo {
background-image: url("//aa.com/img.svg" prefetch);
}
I did however find a "A Collection of Interesting Ideas" with a function <url-modifier> defined.
SVG Parameters (not official spec)
The params() function is a <url-modifier>
.foo {
background-image: url("//aa.com/img.svg" param(--color var(--primary-color)));
}

Rewriting a URL with RegEx

I have a RegEx problem. Consider the following URL:
http://ab.cdefgh.com/aa-BB/index.aspx
I need a regular expression that looks at "aa-BB" and, if it doesn't
match a number of specific values, say:
rr-GG
vv-VV
yy-YY
zz-ZZ
then the URL should redirect to some place. For example:
http://ab.cdefgh.com/notfound.aspx
In web.config I have urlrewrite rules. I need to know what
the regex would be between the tags.
<urlrewrites>
<rule>
<url>?</url>
<rewrite>http://ab.cdefgh.com/notfound.aspx</rewrite>
</rule>
</urlrewrites>
Assuming you don't care about the potential for the replacement pattern to be in the domain name or some other level of the directory structure, this should select on the pattern you're interested in:
http:\/\/ab\.cdefgh\.com\/(?:aa\-BB|rr\-GG|vv\-VV|yy\-YY|zz\-ZZ)\/index\.aspx
where the aa-BB, etc. patterns are simply "or"ed together using the | operator.
To further break this apart, all of the /, ., and - characters need to be escaped with a \ to prevent the regex from interpreting them as syntax. The (?: notation means to group the things being "or"ed without storing it in a backreference variable (this makes it more efficient if you don't care about retaining the value selected).
Here is a link to a demonstration (maybe this can help you play around with the regex here to get to exactly which character combinations you want)
http://rubular.com/r/UfB65UyYrj
Will this help?
^([a-z])\1-([A-Z])\2.*
It matches:
uu-FF/
aa-BB/
bb-CC/index
But not
aaBB
asdf
ba-BB
aA-BB
(Edit based on comment)
Just pipe delimit your desired urls inside of () and escaping special chars.
Eg.
^(xx-YY|yy-ZZ|aa-BB|goodStuff)/.*
But, I think you might actually want the following which matches anything other than the urls that you specify, so that all else goes to notfound.aspx:
^[^(xx-YY|yy-ZZ|aa-BB|goodStuff)]/.*
Assuming you want anything but xx-XX, yy-YY and zz-ZZ to redirect:
[^(xx\-XX)|(yy\-YY)|(zz\-ZZ)]

Resources