Regex to Change Particular Text to CAPS - asp.net

I have a string like this
<PolygonHotSpot PostBackValue="M001" AlternateText="small letters" Coordinates="93, 57, 94" />
I need rejex to capture only AlternateText value and change it into CAPS like SMALL LETTERS
So the string would be
<PolygonHotSpot PostBackValue="M001" AlternateText="SMALL LETTERS" Coordinates="93, 57, 94" />
I've treied something but none of them worked.

People here are not fans of using regex to parse html. With all the warnings about doing so, here's the general method for regex. Someone else may give you the Dom parser alternative (in which case use it).
Use this regex: AlternateText="([^"]*). It captures the text you want to Group 1.
In the replacement, use a lambda to replace Group 1 with what you need.
I could help you with C#, but don't know the ASP.NET syntax. Someone else can give you the details. :)
Explanation
[^"] is a negative character class that matches any character that is not a double quote
the * quantifier matches zero or more of those
the (parentheses) capture that match to Group 1

Related

regex to capture pattern with non-fixed length lookbehind - split string

I'd like to split the string into the following
S <- "No. Ok (whatever). If you must. Please try to be careful (shakes head)."
[1] No.
[2] Ok (whatever). If you must.
[3] Please try to be careful (shakes head).
The pattern is the first . before each (...).
I'm familiar with (?<=...) (i.e. positive lookbehind) but this doesn't seem to work with non-fixed length patterns. I'd like to know if I'm wrong about positive lookbehind or if there's some regex magic to do this. Thanks!
Note that I don't know much about ruby, but there should be something like a split method that uses a regex pattern as a delimiter and split the string accordingly.
Use this regex:
(?<=\.) (?=[^.]+?\(.+?\))
This looks for a space character. Behind the space, there must be a dot (?<=\.). After it (?=, there must be a bunch of characters that are not dots [^.]+?, and then a pair of brackets with something inside \(.+?\).
Try it online: https://regex101.com/r/8PcbFJ/1

Regex for two words in ASP.NET?

Basically I'm trying to code a regex that will accept two words and nothing else, the words can contain any letters, but not numbers.
I've currently got:
^[a-zA-Z+#-.0-9]/s^[a-zA-Z+#-.0-9]$
Although I know for sure this is wrong because it isn't allowing two words separated by a space, it also currently allows numbers.
Does anybody know what Regex code I need to get this working?
This should do the trick! :)
//Allow letters and numbers
^\w+\s\w+$
//Allow only letters
^[a-zA-Z]+\s+[a-zA-Z]+$
While something using the \w shorthand character class might work for you, you specifically wrote can contain any letters, but not numbers, so you'd have to use:
^[a-zA-Z]+\s+[a-zA-Z]+$
Your expression allows numbers (and + and any of these characters: #$%&*()-',.) because you included all of these characters in your character class [a-zA-Z+#-.0-9], which means lowercase and uppercase letters, + sign, any ASCII characters from # to . (which includes $%&*()-',), and any numbers 0-9.
The shorthand character class \w allows letters, numbers, and underscore (_)
I might recommend running through a short tutorial on regex before deciding its the solution for you...
How about something like:
^\w+\s+\w+$

Regular Expression to remove contents in string

I have a string as below:
4s: and in this <em>new</em>, 5s: <em>year</em> everybody try to make our planet clean and polution free.
Replace string:
4s: and in this <em>new</em>, <em>year</em> everybody try to make our planet clean and polution free.
what i want is ,if string have two <em> tags , and if gap between these two <em> tags is of just one word and also , format of that word will be of ns: (n is any numeric value 0 to 4 char. long). then i want to remove ns: from that string. while keeping punctuation marks('?', '.' , ',',) between two <em> as it is.
also i like to add note that. input string may or may not have punctuation marks between these two <em> tags.
My regular expression as below
Regex.Replace(txtHighlight, #"</em>.(\s*)(\d*)s:(\s*).<em", "</em> <em");
Hope it is clear to my requirement.
How can I do this using regular expressions?
Not really sure what you need, but how about:
Regex.Replace(txtHighlight, #"</em>(.)\s*\d+s:\s*(.)<em", "</em>$1$2<em");
If you just want to take out the 4s 5s bit you could do something like this:
Regex.Replace(txtHighlight, #"\s\d\:", "");
This will match a space followed by a digit followed by a colon.
If that's not what you're after, my apologies. I hope it might help :)

Why do greater than and less than symbols match in the following regex?

I'm trying to limit the punctuation that a user can enter into a text box and am using this regex:
^[\w ,-–\[\\\^\$\.\|\?\*\+\(\)\{\}/!##&\`\.'\n\r\f\t""’]*$
Why do > and < produce a match? They are not included in the regex.
NOTE: this is being used in a asp.net regular expression validator.
Edit: here's the asp.net source:
<input runat="server" type="text" id="txt_FName" class="textbox" maxlength="60" />
<asp:RegularExpressionValidator ID="rfvRegexFName" runat="server" ControlToValidate="txt_FName" ErrorMessage="<%$ Resources:Subscribe, inputValidationError %>" />
In the code behind I add the expression:
rfvRegexFName.ValidationExpression = #"^[\w ,-–\[\\\^\$\.\|\?\*\+\(\)\{\}/!##&\`\.'\n\r\f\t""’]*$";
Why do > and < produce a match?
Probably because the - (hyphen) in ,-– matches the character range [, to –]. Either escape the hyphen: ,\-– or place the hyphen at the very start or end of the class which causes it to match the literal - instead.
Also note that you need not escape the $, ., |, ?, *, +, (, ), { and } inside a character class
Edit: After seeing the other answers, it looks like there might have been a few things going on here. The main problem was the unescaped dash, though. For future reference of anyone reading this Q/A thread, see Bart Kiers' answer.
You don't want to escape the period. When it's inside the brackets, it matches a regular period by default, not any character like it does normally. I'm not positive, but that might be making it act as a special character again, therefore matching anything.
Try this:
^[\w ,-–\[\\\^\$.\|\?\*\+\(\)\{\}/!##&\`'\n\r\f\t""’]*$
Try changing the last * to a +. You're matching zero or more instances, which always guarantees a match.
Edit to add: Are all of those characters regular ASCII? It looks like you might be using an em-dash or something, which might be related to your problem.

ASP.NET regular expression to restrict consecutive characters

Using ASP.NET syntax for the RegularExpressionValidator control, how do you specify restriction of two consecutive characters, say character 'x'?
You can provide a regex like the following:
(\\w)\\1+
(\\w) will match any word character, and \\1+ will match whatever character was matched with (\\w).
I do not have access to asp.net at the moment, but take this console app as an example:
Console.WriteLine(regex.IsMatch("hello") ? "Not valid" : "Valid"); // Hello contains to consecutive l:s, hence not valid
Console.WriteLine(regex.IsMatch("Bar") ? "Not valid" : "Valid"); // Bar does not contain any consecutive characters, so it's valid
Alexn is right, this is the way you match consecutive characters with a regex, i.e. (a)\1 matches aa.
However, I think this is a case of everything looking like a nail when you're holding a hammer. I would not use regex to validate this input. Rather, I suggest validating this in code (just looping through the string, comparing str[i] and str[i-1], checking for this condition).
This should work:
^((?<char>\w)(?!\k<char>))*$
It matches abc, but not abbc.
The key is to use so called "zero-width negative lookahead assertion" (syntax: (?! subexpression)).
Here we make sure that a group matched with (?<char>\w) is not followed by itself (expressed with (?!\k<char>)).
Note that \w can be replaced with any valid set of characters (\w does not match white-spaces characters).
You can also do it without named group (note that the referenced group has number 2):
^((\w)(?!\2))*$
And its important to start with ^ and end with $ to match the whole text.
If you want to only exclude text with consecutive x characters, you may use this
^((?<char>x)(?!\k<char>)|[^x\W])*$
or without backreferences
^(x(?!x)|[^x\W])*$
All syntax elements for .NET Framework Regular Expressions are explained here.
You can use a regex to validate what's wrong as well as what's right of course. The regex (.)\1 will match any two consecutive characters, so you can just reject any input that gives an IsValid result to that. If this is the only validation you need, I think this way is far easier than trying to come up with a regex to validate correct input instead.

Resources