How to use Unicode in Regex - asp.net

I am writing one regex to find rows which matches the Unicode char in text file
!Regex.IsMatch(colCount.line, #"^"[\p{IsBasicLatin}\p{IsLatinExtended-A}\p{IsLatinExtended-B}]"+$")
below is the full code which I have written
var _fileName = #"C:\text.txt";
BadLinesLst = File
.ReadLines(_fileName, Encoding.UTF8)
.Select((line, index) =>
{
var count = line.Count(c => Delimiter == c) + 1;
if (NumberOfColumns < 0)
NumberOfColumns = count;
return new
{
line = line,
count = count,
index = index
};
})
.Where(colCount => colCount.count != NumberOfColumns || (Regex.IsMatch(colCount.line, #"[^\p{IsBasicLatin}\p{IsLatinExtended-A}\p{IsLatinExtended-B}]")))
.Select(colCount => colCount.line).ToList();
File contains below rows
264162-03,66,JITK,2007,12,874.000 ,0.000 ,0.000
6420œ50-00,67,JITK,2007,12,2292.000 ,0.000 ,0.000
4804¥75-00,67,JITK,2007,12,1810.000 ,0.000 ,0.000
If file of row contains any other char apart from BasicLatin or LatinExtended-A or LatinExtended-B then I need to get those rows.
The above Regex is not working properly, this is showing those rows as well which contains LatinExtended-A or B

You need to just put the Unicode category classes into a negated character class:
if (Regex.IsMatch(colCount.line,
#"[^\p{IsBasicLatin}\p{IsLatinExtended-A}\p{IsLatinExtended-B}]"))
{ /* Do sth here */ }
This regex will find partial matches (since the Regex.IsMatch finds pattern matches inside larger strings). The pattern will match any character other than the one in \p{IsBasicLatin}, \p{IsLatinExtended-A} and \p{IsLatinExtended-B} Unicode category sets.
You may also want to check the following code:
if (Regex.IsMatch(colCount.line,
#"^[^\p{IsBasicLatin}\p{IsLatinExtended-A}\p{IsLatinExtended-B}]*$"))
{ /* Do sth here */ }
This will return true if the whole colCount.line string does not contain any character from the 3 Unicode category classes specified in the negated character class -or- if the string is empty (if you want to disallow fetching empty strings, replace * with + at the end).

Related

Filter stringset that contain string with substring?

My column value is a string set, how do i filter string set with string that contain a substring
For example
entry1 :
{
ss : [
"TRUE_item",
"FALSE_item"
]
}
entry2 :
{
ss : [
"FASE_item",
"FALSE_item"
]
}
How to i filter entry wihich contain ss that have an element contain TRUE, which in this case should return entry 1?
You cannot. You must give the entire value to match a string within a set.

Javafx 8 replace text in textarea and maintain formatting

We are trying to replace misspelled words in the TextArea and when the word is at the end of a line of text and has a carriage return the process is failing other misspelled words are replace as expected
Example Text Well are we reddy for production the spell test is here
but I fear the dictionary is the limiting factor ?
Here is the carriage return test in the lin abov
Hypenated words test slow-motion and lets not forget the date
Just after the misspelled word abov we have a carriage return in the ArrayList the text looks like this
in, the, lin, abov
Because this misspelled word has no comma after it the replacement code also takes out the misspelled word Hypenated because the replacement code sees "abov & Hypenated" as being at the same Index
Result of running the replacement code
Here is the carriage return test in the lin above words test
If this line of code strArray = line.split(" ");
is changed to this strArray = line.split("\\s"); the issue goes away but so does the formatting in the TextArea all the carriage returns are deleted which is not a desired outcome
The question is how to deal with the formatting issue and still replace the misspelled words?
Side note this only happens when the misspelled word is at the end of a sentences for example the misspelled word "lin" will be replaced as desired
We have an excessive number of lines of code for this project so we are only posting the code that is causing the unsatisfactory results
We tried using just a String[ ] array with little or no success
#FXML
private void onReplace(){
if(txtReplacementWord.getText().isEmpty()){
txtMessage.setText("No Replacement Word");
return;
}
cboMisspelledWord.getItems().remove(txtWordToReplace.getText());
// Line Above Removes misspelled word from cboMisspelledWord
// ==========================================================
String line = txaDiaryEntry.getText();
strArray = line.split(" ");
List<String> list = new ArrayList<>(Arrays.asList(strArray));
for (int R = 0; R < list.size(); R++) {
if(list.get(R).contains(txtWordToReplace.getText())){
theIndex = R;
System.out.println("## dex "+theIndex);//For testing
}
}
System.out.println("list "+list);//For testing
list.remove(theIndex);
list.add(theIndex,txtReplacementWord.getText());
sb = new StringBuilder();
for (String addWord : list) {
sb.append(addWord);
sb.append(" ");
}
txaDiaryEntry.setText(sb.toString());
txtMessage.setText("");
txtReplacementWord.setText("");
txtWordToReplace.setText("");
cboCorrectSpelling.getItems().clear();
cboMisspelledWord.requestFocus();
// Code above replaces misspelled word with correct spelling in TextArea
// =====================================================================
if(cboMisspelledWord.getItems().isEmpty()){
onCheckSpelling();
}
}
Don't use split. This way you loose the info about the content between the words. Instead create a Pattern matching words and make sure to also copy the substrings between matches. This way you don't loose any info there.
The following example replaces the replacement logic with simply looking for replacements in a Map for simplicity, but it should be sufficient to demonstrate the approach:
public void start(Stage primaryStage) throws Exception {
TextArea textArea = new TextArea(
"Well are we reddy for production the spell test is here but I fear the dictionary is the limiting factor ?\n"
+ "\n" + "Here is the carriage return test in the lin abov\n" + "\n"
+ "Hypenated words test slow-motion and lets not forget the date");
Map<String, String> replacements = new HashMap<>();
replacements.put("lin", "line");
replacements.put("abov", "above");
Pattern pattern = Pattern.compile("\\S+"); // pattern matching words (=non-whitespace sequences in this case)
Button button = new Button("Replace");
button.setOnAction(evt -> {
String text = textArea.getText();
StringBuilder sb = new StringBuilder();
Matcher matcher = pattern.matcher(text);
int lastEnd = 0;
while (matcher.find()) {
int startIndex = matcher.start();
if (startIndex > lastEnd) {
// add missing whitespace chars
sb.append(text.substring(lastEnd, startIndex));
}
// replace text, if necessary
String group = matcher.group();
String result = replacements.get(group);
sb.append(result == null ? group : result);
lastEnd = matcher.end();
}
sb.append(text.substring(lastEnd));
textArea.setText(sb.toString());
});
final Scene scene = new Scene(new VBox(textArea, button));
primaryStage.setScene(scene);
primaryStage.show();
}

JsonConvert.DeserializeObject not decoding special character

I am currently serializing an object like this below.
"record" is a string array which has special character in it, like >,<,& etc
The first index in record is "<" and this is a xml special character and which is converted to "&lt"; by SecurityElement.Escape(record[x]);.
After that when I try to escape it using HttpUtility.JavaScriptStringEncode, which is converted into \u0026lt .
var result = new Dictionary<string, string>();
string[] record = { "<", ">", "John & James" };
for (int x = 0; x < record.Length; x++)
{
string xmlEscaped = SecurityElement.Escape(record[x]);
result.Add($"F{235}_{"Property"}{x + 1}", HttpUtility.JavaScriptStringEncode(xmlEscaped));
}
string json= JsonConvert.SerializeObject(result);
and the result from json is
{"F235_Property1":"\u0026lt;","F235_Property2":"\u0026gt;","F235_Property3":"John \u0026amp; James"}
When I deserialize the same json, I use
var jsonConverted = JsonConvert.DeserializeObject(json);
But after deserialization, the special characters converted are not converting back to original.
Example - \u0026lt is not converting back to "&lt";
Please help me on this to get resolved .
Thanks in advance.
Deserialize your json to dictionary, and reverse every action you do before serialization for every value in key-value pairs after.

Regex is not working for particular string

I have to find these string using regex:-
(APP12345-85)
(APP12345XDP-85)
(APP12345X-85)
(APP12345-85) - not working for this one
(APP12345) - not working for this one
the original text is like this
.......some text 123 (APP12345-85) some text...............
My code is:-
Regex rgx = new Regex(#"(APP|REG)[0-9]{5}[A-Z]{5}-[0-9]{2}", caseIgnore);
MatchCollection matches = rgx.Matches(#evalString);
if (matches.Count > 0)
{
//code
}
Any help will be appreciated.
You can also match these entries with
\b(APP|REG)[0-9]{5}[A-Z]{0,5}(?:-[0-9]{2})?\b
Looks like the uppercase letters are optional, so setting to {0,5} looks safe.
And this regex does not check the beginning/end of the string/line.
See demo.
UPDATE:
Here is a sample code for the updated example:
Regex rgx = new Regex(#"\((APP|REG)[0-9]{5}[A-Z]{0,5}(?:-[0-9]{2})?\)", RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches("(APP12345-85)");
if (matches.Count > 0)
{
//code
}
Output of matches:

Cannot convert type "char" to "string" in Foreach loop

I have a hidden field that gets populated with a javascript array of ID's. When I try to iterate the hidden field(called "hidExhibitsIDs") it gives me an error(in the title).
this is my loop:
foreach(string exhibit in hidExhibitsIDs.Value)
{
comLinkExhibitToTask.Parameters.AddWithValue("#ExhibitID", exhibit);
}
when I hover over the .value it says it is "string". But when I change the "string exhibit" to "int exhibit" it works, but gives me an internal error(not important right now).
You need to convert string to string array to using in for loop to get strings not characters as your loop suggests. Assuming comma is delimiter character in the hidden field, hidden field value will be converted to string array by split.
foreach(string exhibit in hidExhibitsIDs.Value.Split(','))
{
comLinkExhibitToTask.Parameters.AddWithValue("#ExhibitID", exhibit);
}
Value is returning a String. When you do a foreach on a String, it iterates over the individual characters in it. What does the value actually look like? You'll have to parse it correctly before you try to use the data.
Example of what your code is somewhat doing right now:
var myString = "Hey";
foreach (var c in myString)
{
Console.WriteLine(c);
}
Will output:
H
e
y
You can use Char.ToString in order to convert
Link : http://msdn.microsoft.com/en-us/library/3d315df2.aspx
Or you can use this if you want convert your tab of char
char[] tab = new char[] { 'a', 'b', 'c', 'd' };
string str = new string(tab);
Value is a string, which implements IEnumerable<char>, so when you foreach over a string, it loops over each character.
I would run the debugger and see what the actual value of the hidden field is. It can't be an array, since when the POST happens, it is converted into a string.
On the server side, The Value property of a HiddenField (or HtmlInputHidden) is just a string, whose enumerator returns char structs. You'll need to split it to iterate over your IDs.
If you set the value of the hidden field on the client side with a JavaScript array, it will be a comma-separated string on the server side, so something like this will work:
foreach(string exhibit in hidExhibitsIDs.Value.Split(','))
{
comLinkExhibitToTask.Parameters.AddWithValue("#ExhibitID", exhibit);
}
public static string reversewordsInsentence(string sentence)
{
string output = string.Empty;
string word = string.Empty;
foreach(char c in sentence)
{
if (c == ' ')
{
output = word + ' ' + output;
word = string.Empty;
}
else
{
word = word + c;
}
}
output = word + ' ' + output;
return output;
}

Resources