ASP.net validator regular expression and accented names / characters - asp.net

I have a asp.net control that is using a regular expression to validate the users input for first name and last name. It works for up to 40 characters...and I think by the looks of the expression it also allows ' for names like O'Donald and maybe hypenated names too.
ValidationExpression="^[a-zA-Z''-'\s]{1,40}$"
My problem is with accented names/characters e.g. Spanish and French names that may contain for example ñ are not allowed. Does anyone know how to modify my expression to take this into account?

You want
\p{L}: any kind of letter from any language.
From regular-expressions.info
\p{L} or \pL is every character in the unicode table that has the property "letter". So it will match every letter from the unicode table.
You can use this within your character class like this
ValidationExpression="^[\p{L}''-'\s]{1,40}$"
Working C# test:
String[] words = { "O'Conner", "Smith", "Müller", "fooñ", "Fooobar12" };
foreach (String s in words) {
Match word = Regex.Match(s, #"
^ # Match the start of the string
[\p{L}''-'\s]{1,40}
$ # Match the end of the string
", RegexOptions.IgnorePatternWhitespace);
if (word.Success) {
Console.WriteLine(s + ": valid");
}
else {
Console.WriteLine(s + ": invalid");
}
}
Console.ReadLine();

Related

Remove all whitespace from string AX 2012

PurchPackingSlipJournalCreate class -> initHeader method have a line;
vendPackingSlipJour.PackingSlipId = purchParmTable.Num;
but i want when i copy and paste ' FDG 2020 ' (all blanks are tab character) in Num area and click okey, write this value as 'FDG2020' in the PackagingSlipId field of the vendPackingSlipJour table.
I tried -> vendPackingSlipJour.PackingSlipId = strRem(purchParmTable.Num, " ");
but doesn't work for tab character.
How can i remove all whitespace characters from string?
Version 1
Try the strAlpha() function.
From the documentation:
Copies only the alphanumeric characters from a string.
Version 2
Because version 1 also deletes allowed hyphens (-), you could use strKeep().
From the documentation:
Builds a string by using only the characters from the first input string that the second input string specifies should be kept.
This will require you to specify all desired characters, a rather long list...
Version 3
Use regular expressions to replace any unwanted characters (defined as "not a wanted character"). This is similar to version 2, but the list of allowed characters can be expressed a lot shorter.
The example below allows alphanumeric characters(a-z,A-Z,0-9), underscores (_) and hyphens (-). The final value for newText is ABC-12_3.
str badCharacters = #"[^a-zA-Z0-9_-]"; // so NOT an allowed character
str newText = System.Text.RegularExpressions.Regex::Replace(' ABC-12_3 ', badCharacters, '');
Version 4
If you know the only unwanted characters are tabs ('\t'), then you can go hunting for those specifically as well.
vendPackingSlipJour.PackingSlipId = strRem(purchParmTable.Num, '\t');

How to check text contain alphabet and digit

I have a textbox that user enter a string into it.
<td class ="auto-style2" > <asp:TextBox ID="TextBox_PassportCode" runat="server" Width ="100%"></asp:TextBox></td>
and:
string code=TextBox_PassportCode.text;
I want to check if "code" contain alphabet + digit together. Eg,A1234 or 1234A Or fhg21564,
It is not important how many alphabet or how many digit user enter, but textbox should contain alphabet and digit.
Try doing like below. Since you said the text could be either A1234 or 1234A; it not necessary that string starts with alphabet. In which case, you can check against regular expression [a-zA-Z0-9]+ which will match combination of 1 or more alphabet + digits. See MSDN on how to use
string code=TextBox_PassportCode.text;
string pat = "[a-zA-Z0-9]+";
if (System.Text.RegularExpressions.Regex.IsMatch(code, pat, System.Text.RegularExpressions.RegexOptions.IgnoreCase))
{
System.Console.WriteLine("Match Found");
}
else
{
System.Console.WriteLine("No Match");
}
Use a regular expression for this.
See this answer.
C# Regex to allow only alpha numeric

Find word (not containing substrings) in comma separated string

I'm using a linq query where i do something liike this:
viewModel.REGISTRATIONGRPS = (From a In db.TABLEA
Select New SubViewModel With {
.SOMEVALUE1 = a.SOMEVALUE1,
...
...
.SOMEVALUE2 = If(commaseparatedstring.Contains(a.SOMEVALUE1), True, False)
}).ToList()
Now my Problem is that this does'n search for words but for substrings so for example:
commaseparatedstring = "EWM,KI,KP"
SOMEVALUE1 = "EW"
It returns true because it's contained in EWM?
What i would need is to find words (not containing substrings) in the comma separated string!
Option 1: Regular Expressions
Regex.IsMatch(commaseparatedstring, #"\b" + Regex.Escape(a.SOMEVALUE1) + #"\b")
The \b parts are called "word boundaries" and tell the regex engine that you are looking for a "full word". The Regex.Escape(...) ensures that the regex engine will not try to interpret "special characters" in the text you are trying to match. For example, if you are trying to match "one+two", the Regex.Escape method will return "one\+two".
Also, be sure to include the System.Text.RegularExpressions at the top of your code file.
See Regex.IsMatch Method (String, String) on MSDN for more information.
Option 2: Split the String
You could also try splitting the string which would be a bit simpler, though probably less efficient.
commaseparatedstring.Split(new Char[] { ',' }).Contains( a.SOMEVALUE1 )
what about:
- separating the commaseparatedstring by comma
- calling equals() on each substring instead of contains() on whole thing?
.SOMEVALUE2 = If(commaseparatedstring.Split(',').Contains(a.SOMEVALUE1), True, False)

What is the regular expression for "No quotes in a string"?

I am trying to write a regular expression that doesn't allow single or double quotes in a string (could be single line or multiline string). Based on my last question, I wrote like this ^(?:(?!"|').)*$, but it is not working. Really appreciate if anybody could help me out here.
Just use a character class that excludes quotes:
^[^'"]*$
(Within the [] character class specifier, the ^ prefix inverts the specification, so [^'"] means any character that isn't a ' or ".)
Just use a regex that matches for quotes, and then negate the match result:
var regex = new Regex("\"|'");
bool noQuotes = !regex.IsMatch("My string without quotes");
Try this:
string myStr = "foo'baa";
bool HasQuotes = myStr.Contains("'") || myStr.Contains("\""); //faster solution , I think.
bool HasQuotes2 = Regex.IsMatch(myStr, "['\"]");
if (!HasQuotes)
{
//not has quotes..
}
This regular expression below, allows alphanumeric and all special characters except quotes(' and "")
#"^[a-zA-Z-0-9~+:;,/#&_#*%$!()\[\] ]*$"
You can use it like
[RegularExpression(#"^[a-zA-Z-0-9~+:;,/#&_#*%$!()**\[\]** ]*$", ErrorMessage = "Should not allow quotes")]
here use escape sequence() for []. Since its not showing in this post

How do you convert posted "english" characters from international PC's in ASP.NET? (ex 2205)

I have a WebForm search page that gets occasional hits from international visitors. When they enter in text, it appears to be plain ASCII a-z, 0-9 but they are printed in bold and my "is this text" logic can't handle the input. Is there any easy way in ASP.NET to convert Unicode characters that equate to A-Z, 0-9 into plain old text?
You are getting so-called "Fullwidth Forms" of the characters. In Unicode, these are encoded at codepoints U+FF01 to U+FF5E. To get the ASCII codepoint (U+0021 to U+007E) from them, you have to get their codepoint and subtract (0xFF01 - 0x0021) from it.
ASCII: http://unicode.org/charts/PDF/U0000.pdf
Fullwidth Forms: http://unicode.org/charts/PDF/UFF00.pdf
I don't speak ASP.NET, but in Java the code would look like this:
String decodeFullwidth(String s) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (0xFF01 <= c && c <= 0xFF5E) {
sb.append((char) (c - (0xFF01 - 0x0021)));
} else {
sb.append(c);
}
}
return sb.toString();
}
it appears to be plain ASCII a-z, 0-9
but they are printed in bold
This could be the Unicode "mathematical bold" characters 𝐚𝐛𝐜𝐝𝐞𝐟𝐠𝐡𝐢𝐣𝐤𝐥𝐦𝐧𝐨𝐩𝐪𝐫𝐬𝐭𝐮𝐯𝐰𝐱𝐲𝐳𝟎𝟏𝟐𝟑𝟒𝟓𝟔𝟕𝟖𝟗. But more likely it's the "fullwidth" characters abcdefghijklmnopqrstuvwxyz0123456789. (These are common in East Asian character encodings: "Fullwidth" refers to being the same width as a Hanzi/Kanji character.)
To convert either set to ASCII, use the Unicode normalization form KC or KD.
You should look at the answer from this question.
It includes the following method (from Michael Kaplan's blog entry "Stripping is an interesting job"):
static string RemoveDiacritics(string stIn) {
string stFormD = stIn.Normalize(NormalizationForm.FormD);
StringBuilder sb = new StringBuilder();
for(int ich = 0; ich < stFormD.Length; ich++) {
UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[ich]);
if(uc != UnicodeCategory.NonSpacingMark) {
sb.Append(stFormD[ich]);
}
}
return(sb.ToString().Normalize(NormalizationForm.FormC));
}
This will strip all the NonSpacingMark characters from a string. This means it will convert é to e, because é is actually build from an e and ´ character.
The ´ is a "NonSpacingMark", meaning that it will be added to the previous character. The method tries to detect this special characters, and rebuilds a string without NonSpacingMark characters. (This is how I understand it, this might not be true).
This will not work for all unicode characters, but an input from users using a latin-based character set (English, Spanish, French, German, etc) will be "cleaned". I have no experience with Asian character sets.
After feedback
I adjusted the routine to the info I got from comments and answers to this question. My current version is:
public static string RemoveDiacritics(string stIn) {
string stFormD = stIn.Normalize(NormalizationForm.FormD);
StringBuilder sb = new StringBuilder();
for (int ich = 0; ich < stFormD.Length; ich++) {
UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[ich]);
switch (uc) {
case UnicodeCategory.NonSpacingMark:
break;
case UnicodeCategory.DecimalDigitNumber:
sb.Append(CharUnicodeInfo.GetDigitValue(stFormD[ich]).ToString());
break;
default:
sb.Append(stFormD[ich]);
break;
}
}
return (sb
.ToString()
.Normalize(NormalizationForm.FormKC));
}
This routing, will remove diacritics (as much as possible), and will convert the other "strange" characters into their "normal" form.
You might try something like this:
Encoding.ASCII.GetString(Encoding.Convert(UnicodeEncoding, ASCIIEncoding, Encoding.Unicode.GetBytes(myString)));
Although, I'm not quire sure what the problem is with the input. What exactly are you doing with the text? Does it matter if it contains more than just ascii characters? And, I especially don't know what you mean by "they are printed in bold".

Resources