Javafx 8 replace text in textarea and maintain formatting - javafx

We are trying to replace misspelled words in the TextArea and when the word is at the end of a line of text and has a carriage return the process is failing other misspelled words are replace as expected
Example Text Well are we reddy for production the spell test is here
but I fear the dictionary is the limiting factor ?
Here is the carriage return test in the lin abov
Hypenated words test slow-motion and lets not forget the date
Just after the misspelled word abov we have a carriage return in the ArrayList the text looks like this
in, the, lin, abov
Because this misspelled word has no comma after it the replacement code also takes out the misspelled word Hypenated because the replacement code sees "abov & Hypenated" as being at the same Index
Result of running the replacement code
Here is the carriage return test in the lin above words test
If this line of code strArray = line.split(" ");
is changed to this strArray = line.split("\\s"); the issue goes away but so does the formatting in the TextArea all the carriage returns are deleted which is not a desired outcome
The question is how to deal with the formatting issue and still replace the misspelled words?
Side note this only happens when the misspelled word is at the end of a sentences for example the misspelled word "lin" will be replaced as desired
We have an excessive number of lines of code for this project so we are only posting the code that is causing the unsatisfactory results
We tried using just a String[ ] array with little or no success
#FXML
private void onReplace(){
if(txtReplacementWord.getText().isEmpty()){
txtMessage.setText("No Replacement Word");
return;
}
cboMisspelledWord.getItems().remove(txtWordToReplace.getText());
// Line Above Removes misspelled word from cboMisspelledWord
// ==========================================================
String line = txaDiaryEntry.getText();
strArray = line.split(" ");
List<String> list = new ArrayList<>(Arrays.asList(strArray));
for (int R = 0; R < list.size(); R++) {
if(list.get(R).contains(txtWordToReplace.getText())){
theIndex = R;
System.out.println("## dex "+theIndex);//For testing
}
}
System.out.println("list "+list);//For testing
list.remove(theIndex);
list.add(theIndex,txtReplacementWord.getText());
sb = new StringBuilder();
for (String addWord : list) {
sb.append(addWord);
sb.append(" ");
}
txaDiaryEntry.setText(sb.toString());
txtMessage.setText("");
txtReplacementWord.setText("");
txtWordToReplace.setText("");
cboCorrectSpelling.getItems().clear();
cboMisspelledWord.requestFocus();
// Code above replaces misspelled word with correct spelling in TextArea
// =====================================================================
if(cboMisspelledWord.getItems().isEmpty()){
onCheckSpelling();
}
}

Don't use split. This way you loose the info about the content between the words. Instead create a Pattern matching words and make sure to also copy the substrings between matches. This way you don't loose any info there.
The following example replaces the replacement logic with simply looking for replacements in a Map for simplicity, but it should be sufficient to demonstrate the approach:
public void start(Stage primaryStage) throws Exception {
TextArea textArea = new TextArea(
"Well are we reddy for production the spell test is here but I fear the dictionary is the limiting factor ?\n"
+ "\n" + "Here is the carriage return test in the lin abov\n" + "\n"
+ "Hypenated words test slow-motion and lets not forget the date");
Map<String, String> replacements = new HashMap<>();
replacements.put("lin", "line");
replacements.put("abov", "above");
Pattern pattern = Pattern.compile("\\S+"); // pattern matching words (=non-whitespace sequences in this case)
Button button = new Button("Replace");
button.setOnAction(evt -> {
String text = textArea.getText();
StringBuilder sb = new StringBuilder();
Matcher matcher = pattern.matcher(text);
int lastEnd = 0;
while (matcher.find()) {
int startIndex = matcher.start();
if (startIndex > lastEnd) {
// add missing whitespace chars
sb.append(text.substring(lastEnd, startIndex));
}
// replace text, if necessary
String group = matcher.group();
String result = replacements.get(group);
sb.append(result == null ? group : result);
lastEnd = matcher.end();
}
sb.append(text.substring(lastEnd));
textArea.setText(sb.toString());
});
final Scene scene = new Scene(new VBox(textArea, button));
primaryStage.setScene(scene);
primaryStage.show();
}

Related

javafx 8 edit text in textarea

We are trying to correct the spelling of words in a TextArea
We have tried two methods onRemoveTwo failed with an IndexOutOfBoundsException
The other method onRemove works but it does a replaceAll we use replace in our code
Here are the results from both methods
Results from using onRemoveTwo
Initial text Take = Cariage NOT ME Carriag add missing voel to Carriage
Request was to correct "Cariage"
First correction results Take = Carriage NOT ME Carriag add missing voel to Carriage
With sb = sb.replace(from, to);
Request was to correct "Carriag"
Second correction results Take = Carriagee NOT MEg add missing voel to Carriage
We have this error Caused by: java.lang.IndexOutOfBoundsException
Caused by this line of code which we understand
while is finding two occurrence of the word
txaInput.replaceText(match.start(),match.end(),txtReplacementWord.getText());
Results from using onRemove
Initial text Take = Cariage NOT ME Carriag add missing voel to Carriage
Request was to correct "Cariage"
First correction results Take = Carriage NOT ME Carriag add missing voel to Carriage
Request was to correct "Carriag"
Second correction results Take = Carriagee NOT ME Carriage add missing voel to Carriagee
Notice both "Carriage" were changed to "Carriagee"
So our question is how to be more specific about the word to be corrected?
private void onRemoveTwo(){
if(txtReplacementWord.getText().isEmpty()){
txtMessage.setText("No Replacement Word");
return;
}
cboMisspelledWord.getItems().remove(txtWordToReplace.getText());
// Line Above Removes misspelled word from cboSelect
// ==================================================
String text = txaInput.getText();
String wordToFind = txtWordToReplace.getText();
Pattern word = Pattern.compile(wordToFind);
Matcher match = word.matcher(text);
while(match.find()){
///System.out.println("Found "+word+" "+ match.start() +" - "+ (match.end()-1));
String from = word.toString();
String to = txtReplacementWord.getText();
String sb = txaInput.getText();
sb = sb.replace(from, to);
txaInput.replaceText(match.start(),match.end(),txtReplacementWord.getText());
txtMessage.setText("");
txtReplacementWord.setText("");
txtWordToReplace.setText("");
cboCorrectSpelling.getItems().clear();
cboMisspelledWord.requestFocus();
// Code above replaces misspelled word with correct spelling in TextArea
// =====================================================================
int SIZE = cboMisspelledWord.getItems().size();
if(SIZE == 0){
onCheckSpelling();
}
}
}
Workable method but changes multiple words
#FXML
private void onReplace(){
if(txtReplacementWord.getText().isEmpty()){
txtMessage.setText("No Replacement Word");
return;
}
cboMisspelledWord.getItems().remove(txtWordToReplace.getText());
// Line Above Removes misspelled word from cboSelect
// ==================================================
String from = txtWordToReplace.getText();
String to = txtReplacementWord.getText();
String sb = txaInput.getText();
sb = sb.replace(from, to);
//sb = sb.replaceAll(from,to);
txaInput.setText("");
txaInput.setText(sb);
txtMessage.setText("");
txtReplacementWord.setText("");
txtWordToReplace.setText("");
cboCorrectSpelling.getItems().clear();
cboMisspelledWord.requestFocus();
// Code above replaces misspelled word with correct spelling in TextArea
// =====================================================================
int SIZE = cboMisspelledWord.getItems().size();
if(SIZE == 0){
onCheckSpelling();
}
}
Well #Grendel I have no idea why this question is voted down. I have been working on a similar project with a TextArea and liked your code and like you found it frustrating that the StringBuilder finds any occurrence of the characters. So here is an answer the code is not real neat you will need to clean it up. I was unhappy that I had to go to a String[] array then to a ArrayList will keep working on this issue
In spite of the down votes enjoy the code
Send the check to 90.83.140.38
#FXML
private void onReplace(){
if(txtReplacementWord.getText().isEmpty()){
txtMessage.setText("No Replacement Word");
return;
}
cboMisspelledWord.getItems().remove(txtWordToReplace.getText());
// Line Above Removes misspelled word from cboMisspelledWord
// ==========================================================
String line = txaInput.getText();
oneA = line.split("\\s");
List<String> list = new ArrayList<>(Arrays.asList(oneA));
int theIndex = list.indexOf(txtWordToReplace.getText());
String gotME = list.get(theIndex);
list.remove(theIndex);
list.add(theIndex,txtReplacementWord.getText());
sb = new StringBuilder();
for (String addWord : list) {
sb.append(addWord);
sb.append(" ");
}
txaInput.setText(sb.toString());
txtMessage.setText("");
txtReplacementWord.setText("");
txtWordToReplace.setText("");
cboCorrectSpelling.getItems().clear();
cboMisspelledWord.requestFocus();
// Code above replaces misspelled word with correct spelling in TextArea
// =====================================================================
if(cboMisspelledWord.getItems().isEmpty()){
onCheckSpelling();
}
}

Extract text with iText not works: encoding or crypted text?

I have a pdf file that as the follow security properties: printing: allowed; document assembly: NOT allowed; content copy: allowed; content copy for accessibility: allowed; page extraction:NOT allowed;
I try to get text with sample code as documentation sample as follow:
pdftext.Text = null;
StringBuilder text = new StringBuilder();
PdfReader pdfReader = new PdfReader(filename);
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
text.Append(System.Environment.NewLine);
text.Append("\n Page Number:" + page);
text.Append(System.Environment.NewLine);
currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
text.Append(currentText);
progressBar1.Value++;
}
pdftext.Text += text.ToString();
pdfReader.Close();
but the output text is lines with ""??? ? ???????\n?? ??? ? " values;
seems that file is crypted or we have a encoding problem...
note that in the follow lines
var f = pdfReader.IsOpenedWithFullPermissions; -> FALSE
var f1 = pdfReader.IsEncrypted(); - > FALSE
var f2 = pdfReader.ComputeUserPassword(); - > NULL
var f3 = pdfReader.Is128Key(); - > FALSE
var f4 = pdfReader.HasUsageRights();
f, f1, f3, f4 return FALSE ...than seems that the document is not crypted,
...so I don't know if is a Encoding problem or question related to encrypet strings...
Someone can help me?
thanks in advance.
G.G.
Whenever you have trouble extracting text from a document using standard code, the first thing to do is try and copy&paste the text from it using Adobe Acrobat Reader. Adobe Reader copy&paste implements text extraction according to the recommendations of the PDF specification, and if this fails, this usually means that the necessary information required for text extraction in the document are either missing or broken (by accident or by design). To extract the text, one either needs to customize the code specifically to the specific PDF or resort to OCR.
In case of the document at hand, Adobe Reader copy&paste does result in garbage, too, just like when extracting with iText. Thus, there is something fishy in the document.
Inspecting the document one finds that the fonts contain ToUnicode mappings like this:
/CIDInit /ProcSet
findresource begin 12 dict begin begincmap /CIDSystemInfo<</Registry(Adobe)
/Ordering(Identity)
/Supplement 0
>>
def
/CMapName/F18 def
1 begincodespacerange <0000> <FFFF> endcodespacerange
44 beginbfrange
<20> <20> <0020>
<21> <21> <E0F9>
<22> <22> <E0F1>
<23> <23> <E0FA>
<24> <24> <E0F7>
<25> <25> <E0A3>
<26> <26> <E084>
<27> <27> <E097>
<28> <28> <E098>
<29> <29> <E09A>
<2A> <2A> <E08A>
<2B> <2B> <E099>
<2C> <2C> <E0A5>
<2D> <2D> <E086>
<2E> <2E> <E094>
<2F> <2F> <E0DE>
<30> <30> <E0A6>
<31> <31> <E096>
<32> <32> <E088>
<33> <33> <E082>
<34> <34> <E04C>
<35> <35> <E0A4>
<36> <36> <E0F6>
<37> <37> <E0F2>
<38> <38> <E0D8>
<39> <39> <E0AA>
<3A> <3A> <E06C>
<3B> <3B> <E087>
<3C> <3C> <E095>
<3D> <3D> <E0C4>
<3E> <3E> <E07E>
<3F> <3F> <E055>
<40> <40> <E089>
<41> <41> <E085>
<42> <42> <E083>
<43> <43> <E070>
<44> <44> <E0E6>
<45> <45> <E080>
<46> <46> <E0C8>
<47> <47> <E0F4>
<48> <48> <E062>
<49> <49> <E0F3>
<4A> <4A> <E04E>
<4B> <4B> <E05E>
endbfrange
endcmap CMapName currentdict /CMap defineresource pop end end
I.e., if you are not into this, the fonts claim that all their glyphs (with the exception of the space glyph at 0x20) represent characters U+E0xx from the Unicode private use area. As the name of that area indicates, there is no common meaning of characters with these values.
Thus, text extraction according to the PDF specification will return strings of characters with undefined meaning with results as you observed in iText or I saw in Adobe Reader.
Sometimes in such a situation one can still enforce proper text extraction by ignoring the ToUnicode map and using either the font Encoding or information inside the embedded font program.
Unfortunately it turns out that here the Encoding effectively contains the same information as does the ToUnicode map, e.g. for the same font as above
/Differences [ 32 /space /uniE0F9 /uniE0F1 /uniE0FA /uniE0F7 /uniE0A3 /uniE084 /uniE097 /uniE098
/uniE09A /uniE08A /uniE099 /uniE0A5 /uniE086 /uniE094 /uniE0DE /uniE0A6 /uniE096
/uniE088 /uniE082 /uniE04C /uniE0A4 /uniE0F6 /uniE0F2 /uniE0D8 /uniE0AA /uniE06C
/uniE087 /uniE095 /uniE0C4 /uniE07E /uniE055 /uniE089 /uniE085 /uniE083 /uniE070
/uniE0E6 /uniE080 /uniE0C8 /uniE0F4 /uniE062 /uniE0F3 /uniE04E /uniE05E ]
and the fonts turns out to be Type3 fonts, i.e. there is no embedded font program but each glyph is defined as an individual PDF canvas without further character information.
Thus, nothing to gain here either.
Actually these small PDF canvasses contain inlined bitmap graphics of the respective glyph which also is the cause of the poor graphical quality of the document (if you don't see that immediately, simply zoom in a bit and you'll see the ragged outlines of the glyphs).
By the way, such a construct usually means that the producer of the PDF explicitly wants to prevent text extraction.
If you happen to have to extract text from many such documents, you can try and determine a mapping from their U+E0xx characters to actually sensible Unicode characters and apply that mapping to your extracted text.
If all those fonts in all those documents happen to use the same U+E0xx codepoints for the same actual characters, you'll be able to do text extraction from those documents after investing a certain amount of initial work.
Otherwise do try OCR.
The following code adds pages to a document which map the ToUnicode values to the characters shown:
void AddFontsTo(PdfReader reader, PdfStamper stamper)
{
int documentPages = reader.NumberOfPages;
for (int page = 1; page <= documentPages; page++)
{
// ignore inherited resources for now
PdfDictionary pageResources = reader.GetPageResources(page);
if (pageResources == null)
continue;
PdfDictionary pageFonts = pageResources.GetAsDict(PdfName.FONT);
if (pageFonts == null || pageFonts.Size == 0)
continue;
List<BaseFont> fonts = new List<BaseFont>();
List<string> fontNames = new List<string>();
HashSet<char> chars = new HashSet<char>();
foreach (PdfName key in pageFonts.Keys)
{
PdfIndirectReference fontReference = pageFonts.GetAsIndirectObject(key);
if (fontReference == null)
continue;
DocumentFont font = (DocumentFont) BaseFont.CreateFont((PRIndirectReference)fontReference);
if (font == null)
continue;
PdfObject toUni = PdfReader.GetPdfObjectRelease(font.FontDictionary.Get(PdfName.TOUNICODE));
CMapToUnicode toUnicodeCmap = null;
if (toUni is PRStream)
{
try
{
byte[] touni = PdfReader.GetStreamBytes((PRStream)toUni);
CidLocationFromByte lb = new CidLocationFromByte(touni);
toUnicodeCmap = new CMapToUnicode();
CMapParserEx.ParseCid("", toUnicodeCmap, lb);
}
catch
{
toUnicodeCmap = null;
}
}
if (toUnicodeCmap == null)
continue;
ICollection<int> mapValues = toUnicodeCmap.CreateDirectMapping().Values;
if (mapValues.Count == 0)
continue;
fonts.Add(font);
fontNames.Add(key.ToString());
foreach (int value in mapValues)
chars.Add((char)value);
}
if (fonts.Count == 0 || chars.Count == 0)
continue;
Rectangle size = (fonts.Count > 10) ? PageSize.A4.Rotate() : PageSize.A4;
PdfPTable table = new PdfPTable(fonts.Count + 1);
table.AddCell("Page " + page);
foreach (String name in fontNames)
{
table.AddCell(name);
}
table.HeaderRows = 1;
float[] widths = new float[fonts.Count + 1];
widths[0] = 2;
for (int i = 1; i <= fonts.Count; i++)
widths[i] = 1;
table.SetWidths(widths);
table.WidthPercentage = 100;
List<char> charList = new List<char>(chars);
charList.Sort();
foreach (char character in charList)
{
table.AddCell(((int)character).ToString("X4"));
foreach (BaseFont font in fonts)
{
table.AddCell(new PdfPCell(new Phrase(character.ToString(), new Font(font))));
}
}
stamper.InsertPage(reader.NumberOfPages + 1, size);
ColumnText columnText = new ColumnText(stamper.GetUnderContent(reader.NumberOfPages));
columnText.AddElement(table);
columnText.SetSimpleColumn(size);
while ((ColumnText.NO_MORE_TEXT & columnText.Go(false)) == 0)
{
stamper.InsertPage(reader.NumberOfPages + 1, size);
columnText.Canvas = stamper.GetUnderContent(reader.NumberOfPages);
columnText.SetSimpleColumn(size);
}
}
}
I applied it to your document like this:
string input = #"4700198773.pdf";
string output = #"4700198773-fonts.pdf";
using (PdfReader reader = new PdfReader(input))
using (FileStream stream = new FileStream(output, FileMode.Create, FileAccess.Write))
using (PdfStamper stamper = new PdfStamper(reader, stream))
{
AddFontsTo(reader, stamper);
}
The additional pages look like this:
Now you have to compare the outputs for the different fonts and pages of this document with each other and with those of a representative selection of file. If you find good enough a pattern, you can try this replacement way.

How to update a PDF file?

I am required to replace a word with a new word, selected from a drop-down list by user, in a PDF document in ASP.NET. I am using iTextSharp , but the new PDF that is created is all distorted as I am not able to extract the formatting/styling info of the PDF while extracting. Also, IS There a way to read a pdf line-by-line? Please help..
protected void Page_Load(object sender, EventArgs e)
{
String s = DropDownList1.SelectedValue;
Response.Write(s);
ListFieldNames(s);
}
private void CreatePDF(string text)
{
string outFileName = #"z:\TEMP\PDF\Test_abc.pdf";
Document doc = new Document();
doc.SetMargins(30f, 30f, 30f, 30f);
PdfWriter.GetInstance(doc, new FileStream(outFileName, FileMode.Create));
doc.Open();
BaseFont bfTimes = BaseFont.CreateFont(BaseFont.COURIER, BaseFont.CP1252, false);
Font times = new Font(bfTimes, 12, Font.BOLDITALIC);
//Chunk ch = new Chunk(text,times);
Paragraph para = new Paragraph(text,times);
//para.SpacingAfter = 9f;
para.Alignment = Element.ALIGN_CENTER;
//para.IndentationLeft = 100;
doc.Add(para);
//doc.Add(new Paragraph(text,times));
doc.Close();
Response.Redirect(#"z:\TEMP\PDF\Test_abc.pdf",false);
}
private void ListFieldNames(string s)
{
ArrayList arrCheck = new ArrayList();
try
{
string pdfTemplate = #"z:\TEMP\PDF\abc.pdf";
//string dest = #"z:\TEMP\PDF\Test_abc.pdf";
PdfReader pdfReader = new PdfReader(pdfTemplate);
string pdfText = string.Empty;
string extracttext = "";
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
PdfReader reader = new PdfReader((string)pdfTemplate);
extracttext = PdfTextExtractor.GetTextFromPage(reader, page, its);
extracttext = Encoding.Unicode.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.Unicode, Encoding.Default.GetBytes(extracttext)));
pdfText = pdfText + extracttext;
pdfText = pdfText.Replace("[xyz]", s);
pdfReader.Close();
}
CreatePDF(pdfText);
}
catch (Exception ex)
{
}
finally
{
}
}
You are making one wrong assumption after the other.
You assume that the concept of "lines" exists in PDF. This is wrong. In Text State, different snippets of text are drawn on the page at absolute positions. For every "show text" operator, iText will return a TextRenderInfo object with the portion of text that was drawn and its coordinates. One line can consist of multiple text snippets. A text snippet may contain whitespace or may even be empty.
You assume that all text in a PDF keeps its natural reading order. This should be true for PDF/UA (UA stands for Universal Accessibility), but it's certainly not true for most PDFs you can find in the wild. That's why iText provides location-based text extraction (see p521 of iText in Action, Second Edition). As explained on p516, the text "Hello World" can be stored in the PDF as "ld", "Wor", "llo", "He". The LocationTextExtractionStrategy will order all the text snippets, reconstructing words if necessary. For instance: it will concatenate "He" and "llo" to "Hello", because there's not sufficient space between the "He" snippet and the "llo" snippet. However, for reasons unknown (probably ignorance), you're using the SimpleTextExtractionStrategy which doesn't order the text based on its location.
You are completely ignoring all the Graphics State operators, as well as the Text State operators that define the font, etc...
You assume that PDF is a Word processing format. This is wrong on many levels, as is your code. Please read the intro of chapter 6 of my book.
All these wrong assumptions almost make me want to vote down your question. At the risk of being voted down myself for this answer, I must tell you that you shouldn't try to "do the same". You're asking something that is very complex, and in many cases even impossible!

Cannot convert type "char" to "string" in Foreach loop

I have a hidden field that gets populated with a javascript array of ID's. When I try to iterate the hidden field(called "hidExhibitsIDs") it gives me an error(in the title).
this is my loop:
foreach(string exhibit in hidExhibitsIDs.Value)
{
comLinkExhibitToTask.Parameters.AddWithValue("#ExhibitID", exhibit);
}
when I hover over the .value it says it is "string". But when I change the "string exhibit" to "int exhibit" it works, but gives me an internal error(not important right now).
You need to convert string to string array to using in for loop to get strings not characters as your loop suggests. Assuming comma is delimiter character in the hidden field, hidden field value will be converted to string array by split.
foreach(string exhibit in hidExhibitsIDs.Value.Split(','))
{
comLinkExhibitToTask.Parameters.AddWithValue("#ExhibitID", exhibit);
}
Value is returning a String. When you do a foreach on a String, it iterates over the individual characters in it. What does the value actually look like? You'll have to parse it correctly before you try to use the data.
Example of what your code is somewhat doing right now:
var myString = "Hey";
foreach (var c in myString)
{
Console.WriteLine(c);
}
Will output:
H
e
y
You can use Char.ToString in order to convert
Link : http://msdn.microsoft.com/en-us/library/3d315df2.aspx
Or you can use this if you want convert your tab of char
char[] tab = new char[] { 'a', 'b', 'c', 'd' };
string str = new string(tab);
Value is a string, which implements IEnumerable<char>, so when you foreach over a string, it loops over each character.
I would run the debugger and see what the actual value of the hidden field is. It can't be an array, since when the POST happens, it is converted into a string.
On the server side, The Value property of a HiddenField (or HtmlInputHidden) is just a string, whose enumerator returns char structs. You'll need to split it to iterate over your IDs.
If you set the value of the hidden field on the client side with a JavaScript array, it will be a comma-separated string on the server side, so something like this will work:
foreach(string exhibit in hidExhibitsIDs.Value.Split(','))
{
comLinkExhibitToTask.Parameters.AddWithValue("#ExhibitID", exhibit);
}
public static string reversewordsInsentence(string sentence)
{
string output = string.Empty;
string word = string.Empty;
foreach(char c in sentence)
{
if (c == ' ')
{
output = word + ' ' + output;
word = string.Empty;
}
else
{
word = word + c;
}
}
output = word + ' ' + output;
return output;
}

Function to convert "camel case" type text to text with spaces in between? ie: HelloWorld --> Hello World

Anyone know of a nice efficient function that could convert, for example:
HelloWorld --> Hello World
helloWorld --> Hello World
Hello_World --> Hello World
hello_World --> Hello World
It would be nice to be able to handle all these situations.
Preferably in in VB.Net, or C#.
I donĀ“t know if this is the most efficient way. But this method works fine:
EDIT 1: I have include Char.IsUpper suggestion in the comments
EDIT 2: included another suggestion in the comments: ToCharArray is superfluous because string implements enumerable ops as a char too, i.e. foreach (char character in input)
EDIT 3: I've used StringBuilder, like #Dan commented.
public string CamelCaseToTextWithSpaces(string input)
{
StringBuilder output = new StringBuilder();
input = input.Replace("_", "");
foreach (char character in input)
{
if (char.IsUpper(character))
{
output.Append(' ');
}
if (output.Length == 0)
{
// The first letter must be always UpperCase
output.Append(Char.ToUpper(character));
}
else
{
output.Append(character);
}
}
return output.ToString().Trim();
}
There are some other possibilities you might want to cater for - for instance, you probably don't want to add spaces to abbreviations/acronyms.
I'd recommend using:
Private CamelCaseConverter As Regex = New Regex("(?<char1>[0-9a-z])(?<char2>[A-Z])", RegexOptions.Compiled + RegexOptions.CultureInvariant)
Public Function CamelCaseToWords(CamelCaseString As String) As String
Return CamelCaseConverter.Replace(CamelCaseString, "${char1} ${char2}")
End Function
'Gives:
'CamelCase => Camel Case
'PIN => PIN
What it doesn't do is uppercase the first letter of the first word, but you can look at the other examples for ways of doing that, or maybe someone can come up with a clever RegEx way of doing it.
Sounded fun so I coded it the most important part is the regex take a look at this site for more documentation.
private static string BreakUpCamelCase(string s)
{
MatchCollection MC = Regex.Matches(s, #"[0-9a-z][A-Z]");
int LastMatch = 0;
System.Text.StringBuilder SB = new StringBuilder();
foreach (Match M in MC)
{
SB.AppendFormat("{0} ", s.Substring(LastMatch, M.Index + 1 - LastMatch));
LastMatch = M.Index + 1;
}
if (LastMatch < s.Length)
{
SB.AppendFormat("{0} ", s.Substring(LastMatch));
}
return SB.ToString();
}

Resources