I have a string that contains some text followed by a blank line. What's the best way to keep the part with text, but remove the whitespace newline from the end?
Use String.trim() method to get rid of whitespaces (spaces, new lines etc.) from the beginning and end of the string.
String trimmedString = myString.trim();
String.replaceAll("[\n\r]", "");
This Java code does exactly what is asked in the title of the question, that is "remove newlines from beginning and end of a string-java":
String.replaceAll("^[\n\r]", "").replaceAll("[\n\r]$", "")
Remove newlines only from the end of the line:
String.replaceAll("[\n\r]$", "")
Remove newlines only from the beginning of the line:
String.replaceAll("^[\n\r]", "")
tl;dr
String cleanString = dirtyString.strip() ; // Call new `String::string` method.
String::strip…
The old String::trim method has a strange definition of whitespace.
As discussed here, Java 11 adds new strip… methods to the String class. These use a more Unicode-savvy definition of whitespace. See the rules of this definition in the class JavaDoc for Character::isWhitespace.
Example code.
String input = " some Thing ";
System.out.println("before->>"+input+"<<-");
input = input.strip();
System.out.println("after->>"+input+"<<-");
Or you can strip just the leading or just the trailing whitespace.
You do not mention exactly what code point(s) make up your newlines. I imagine your newline is likely included in this list of code points targeted by strip:
It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F').
It is '\t', U+0009 HORIZONTAL TABULATION.
It is '\n', U+000A LINE FEED.
It is '\u000B', U+000B VERTICAL TABULATION.
It is '\f', U+000C FORM FEED.
It is '\r', U+000D CARRIAGE RETURN.
It is '\u001C', U+001C FILE SEPARATOR.
It is '\u001D', U+001D GROUP SEPARATOR.
It is '\u001E', U+001E RECORD SEPARATOR.
It is '\u001F', U+0
If your string is potentially null, consider using StringUtils.trim() - the null-safe version of String.trim().
If you only want to remove line breaks (not spaces, tabs) at the beginning and end of a String (not inbetween), then you can use this approach:
Use a regular expressions to remove carriage returns (\\r) and line feeds (\\n) from the beginning (^) and ending ($) of a string:
s = s.replaceAll("(^[\\r\\n]+|[\\r\\n]+$)", "")
Complete Example:
public class RemoveLineBreaks {
public static void main(String[] args) {
var s = "\nHello world\nHello everyone\n";
System.out.println("before: >"+s+"<");
s = s.replaceAll("(^[\\r\\n]+|[\\r\\n]+$)", "");
System.out.println("after: >"+s+"<");
}
}
It outputs:
before: >
Hello world
Hello everyone
<
after: >Hello world
Hello everyone<
I'm going to add an answer to this as well because, while I had the same question, the provided answer did not suffice. Given some thought, I realized that this can be done very easily with a regular expression.
To remove newlines from the beginning:
// Trim left
String[] a = "\n\nfrom the beginning\n\n".split("^\\n+", 2);
System.out.println("-" + (a.length > 1 ? a[1] : a[0]) + "-");
and end of a string:
// Trim right
String z = "\n\nfrom the end\n\n";
System.out.println("-" + z.split("\\n+$", 2)[0] + "-");
I'm certain that this is not the most performance efficient way of trimming a string. But it does appear to be the cleanest and simplest way to inline such an operation.
Note that the same method can be done to trim any variation and combination of characters from either end as it's a simple regex.
Try this
function replaceNewLine(str) {
return str.replace(/[\n\r]/g, "");
}
String trimStartEnd = "\n TestString1 linebreak1\nlinebreak2\nlinebreak3\n TestString2 \n";
System.out.println("Original String : [" + trimStartEnd + "]");
System.out.println("-----------------------------");
System.out.println("Result String : [" + trimStartEnd.replaceAll("^(\\r\\n|[\\n\\x0B\\x0C\\r\\u0085\\u2028\\u2029])|(\\r\\n|[\\n\\x0B\\x0C\\r\\u0085\\u2028\\u2029])$", "") + "]");
Start of a string = ^ ,
End of a string = $ ,
regex combination = | ,
Linebreak = \r\n|[\n\x0B\x0C\r\u0085\u2028\u2029]
Another elegant solution.
String myString = "\nLogbasex\n";
myString = org.apache.commons.lang3.StringUtils.strip(myString, "\n");
For anyone else looking for answer to the question when dealing with different linebreaks:
string.replaceAll("(\n|\r|\r\n)$", ""); // Java 7
string.replaceAll("\\R$", ""); // Java 8
This should remove exactly the last line break and preserve all other whitespace from string and work with Unix (\n), Windows (\r\n) and old Mac (\r) line breaks: https://stackoverflow.com/a/20056634, https://stackoverflow.com/a/49791415. "\\R" is matcher introduced in Java 8 in Pattern class: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
This passes these tests:
// Windows:
value = "\r\n test \r\n value \r\n";
assertEquals("\r\n test \r\n value ", value.replaceAll("\\R$", ""));
// Unix:
value = "\n test \n value \n";
assertEquals("\n test \n value ", value.replaceAll("\\R$", ""));
// Old Mac:
value = "\r test \r value \r";
assertEquals("\r test \r value ", value.replaceAll("\\R$", ""));
String text = readFileAsString("textfile.txt");
text = text.replace("\n", "").replace("\r", "");
I'm trying to figure out how I can get the last text inside node which is the ID 104543.
<id>tag:website.com:feed/web/main/104543</id>
The output should be 104543.
If you are using .NET 3.5 or higher, then you could use the XElement class in System.Xml.Linq.
You could retrieve the tag element content in the following way:
string str = #"<id>tag:website.com:feed/web/main/104543</id>";
XElement element = XElement.Parse(str);
var content = element.Descendants("id").FirstOrDefault().Value;
Now, parsing the content depends on how this is structured: if the code you want to extract will always be placed after the last "/" character, then you could do the following:
string code = content.Split(new[] { "/" }, StringSplitOptions.None).Last();
i need to put all text of a docx in a stringBuilder, also with tab and hyphen.
i've tried the use of org.docx4j.TextUtils, but in the resultant string doesn't seen tab.
String inputfilepath = System.getProperty("user.home") + "test.docx";
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new java.io.File(inputfilepath));
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
org.docx4j.wml.Document wmlDocumentEl = (org.docx4j.wml.Document)documentPart.getJaxbElement();
Writer out = new OutputStreamWriter(System.out);
extractText(wmlDocumentEl, out);
out.close();
As per my answer at http://www.docx4java.org/forums/docx-java-f6/is-it-possible-to-extract-all-text-also-tab-and-hyphen-t1996.html#p6933?sid=b0d58fec2ba349d0f3f49cf66411397c
The problem with tab and hyphen, as I guess you know, is that they aren't represented in the docx as normal characters.
Tab is w:tab
A hyphen might be a hyphen character, or it might be displayed (without being actually in the docx), or it might be:
http://webapp.docx4java.org/OnlineDemo/ecma376/WordML/noBreakHyphen.html
or http://webapp.docx4java.org/OnlineDemo/ecma376/WordML/softHyphen.html
Replicating Word's hyphenation behaviour would be a challenge.
But for the others, there are three approaches which occur to me:
generalising your traverse approach (are you using TraversalUtil.getChildrenImpl?)
doing it in XSLT (you can do this in docx4j, but XSLT is probably slower, and a mix of technologies)
marshal the main document part to a string, do suitable string replacements, then unmarshal, then use TextUtils
For (3), assuming MainDocumentPart mdp, to get it as a String:
String stringContent = mdp.getXML();
Then to inject the modified content:
mdp.setContents((Document)XmlUtils.unmarshalString(stringContent) );
I am trying to upload a CSV file that has special characters using ServletFileUpload of apache common. But the special characters present in the CSV are being stored as junk characters in the database. The special characters I have are Trademark, registered etc. Following is the code snippet.
ServletFileUpload upload = new ServletFileUpload();
FileItemIterator iter = upload.getItemIterator(request);
while (iter.hasNext()) {
FileItemStream item = iter.next();
String name = item.getFieldName();
InputStream stream = item.openStream();
if (item.isFormField()) {
System.out.println("Form field " + name + " with value "
+ Streams.asString(stream, "UTF-8") + " detected.");
}
}
I have tried reading it using BufferendReader, used request.setCharacterEncoding("UTF-8"), tried upload.setHeaderEncoding("UTF-8") and also checked with IOUtils.copy() method, but none of them worked.
Please advice how to get rid of this issue and where it needs to be addressed? Is there anything I need to do beyond servlet code?
Thanks
What database are using? What character set is database using? Characters can be malformed in the database rather than in Java code.
I have code like this
string pattern = "<(.|\n)+?>";
System.Text.RegularExpressions.Regex regEx = new System.Text.RegularExpressions.Reg(pattern);
string result = "";
result = regEx.Replace(htmlText, "");
In this "htmlText" will have some html code which also contains break tags. Right now its replacing all the html tags, but I want to leave break tag and replace the rest.
How can i do it? Anybody have any idea?
Thanks
You could try this one:
<(?!br|/br).+?>
This should work:
string html = "<span>test<br><br /></span>";
Regex regex = new Regex("<[^(?!br)>]*>", RegexOptions.Compiled);
string result = regex.Replace(html, string.Empty);