How do I remove all page formatting from NSString - nsstring

I have an NSMutableString* (theBigString) that contains text loaded from an online file. I am writing theBigString to a local file on the iPad and then emailing it as an attachment.
[theBigString appendString:[[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding]];
Despite my best efforts using the following lines of code, when I open the email attachment there still exists multiple lines of text instead of one long line of text which is what I expected (and need for my app to work).
[theBigString stringByReplacingOccurrencesOfString:#"\\s+" withString:#" "];
[theBigString stringByReplacingOccurrencesOfString:#"\n" withString:#" "];
[theBigString stringByReplacingOccurrencesOfString:#"\r" withString:#" "];
[theBigString stringByReplacingOccurrencesOfString:#"\r\n" withString:#" "];
If I open the text file and view the formatting it still shows a couple of new paragraph symbols.
Is there another newline type character besides "\n" or "\r" that I am not stripping with the above code?

NSCharacterSet defines [NSCharacterSet newLineCharacterSet] as (U+000A–U+000D, U+0085)
If you don't care too much about efficiency, you could also just separate the string into an array at the newline character locations and combine it again with empty strings.
NSArray* stringComponents = [theBigString componentsSeparatedByCharactersInSet:[NSCharacterSet newlineCharacterSet]];
theBigString = [stringComponents componentsJoinedByString:#""];

Related

Replacing a new line character with streamwriter remove everything after it. (ASP.NET, Json, C#)

I'm having an unexpected problem which I'm hoping one of you can help me with.
I have an ASP.NET Web API with a number of end points, one of which takes user input, received as JSON, and converts it into an order object which is written to a file in .CSV format. The following is a small snippet of the full code as an example.
using (StreamWriter writer = new StreamWriter(file))
{
writer.Write(escape + order.Notes + escape + delim);
writer.Write(escape + order.Reference1 + escape + delim);
writer.Write(escape + order.Reference2 + escape + delim);
writer.WriteLine();
writer.Flush();
}
The problem I am having is that some users are inclined to add line breaks in
certain fields, and this is causing havoc with my order file. In order to remove
these new line characters, I have tried both of the following methods.
writer.Write(escape + product.Notes.Replace("\n", " ") + escape + delim);
writer.Write(escape + product.Notes.Replace(System.Environment.NewLine, " ") + escape + delim);
However, it seems that, rather than just remove the new line character and carry on writing the rest of the fields, when a new line is encountered, nothing else gets written.
Either everything else gets replace with the " " or nothing else is being written at all, but I'm not sure which.
If I remove the .Replace() the whole file is written again but with extra line breaks.
I hope somebody has experienced this one and knows the answer!

Show all text of a docx in a stringBuilder with docx4j

i need to put all text of a docx in a stringBuilder, also with tab and hyphen.
i've tried the use of org.docx4j.TextUtils, but in the resultant string doesn't seen tab.
String inputfilepath = System.getProperty("user.home") + "test.docx";
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new java.io.File(inputfilepath));
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
org.docx4j.wml.Document wmlDocumentEl = (org.docx4j.wml.Document)documentPart.getJaxbElement();
Writer out = new OutputStreamWriter(System.out);
extractText(wmlDocumentEl, out);
out.close();
As per my answer at http://www.docx4java.org/forums/docx-java-f6/is-it-possible-to-extract-all-text-also-tab-and-hyphen-t1996.html#p6933?sid=b0d58fec2ba349d0f3f49cf66411397c
The problem with tab and hyphen, as I guess you know, is that they aren't represented in the docx as normal characters.
Tab is w:tab
A hyphen might be a hyphen character, or it might be displayed (without being actually in the docx), or it might be:
http://webapp.docx4java.org/OnlineDemo/ecma376/WordML/noBreakHyphen.html
or http://webapp.docx4java.org/OnlineDemo/ecma376/WordML/softHyphen.html
Replicating Word's hyphenation behaviour would be a challenge.
But for the others, there are three approaches which occur to me:
generalising your traverse approach (are you using TraversalUtil.getChildrenImpl?)
doing it in XSLT (you can do this in docx4j, but XSLT is probably slower, and a mix of technologies)
marshal the main document part to a string, do suitable string replacements, then unmarshal, then use TextUtils
For (3), assuming MainDocumentPart mdp, to get it as a String:
String stringContent = mdp.getXML();
Then to inject the modified content:
mdp.setContents((Document)XmlUtils.unmarshalString(stringContent) );

CSV file (with special characters) upload encoding issue

I am trying to upload a CSV file that has special characters using ServletFileUpload of apache common. But the special characters present in the CSV are being stored as junk characters in the database. The special characters I have are Trademark, registered etc. Following is the code snippet.
ServletFileUpload upload = new ServletFileUpload();
FileItemIterator iter = upload.getItemIterator(request);
while (iter.hasNext()) {
FileItemStream item = iter.next();
String name = item.getFieldName();
InputStream stream = item.openStream();
if (item.isFormField()) {
System.out.println("Form field " + name + " with value "
+ Streams.asString(stream, "UTF-8") + " detected.");
}
}
I have tried reading it using BufferendReader, used request.setCharacterEncoding("UTF-8"), tried upload.setHeaderEncoding("UTF-8") and also checked with IOUtils.copy() method, but none of them worked.
Please advice how to get rid of this issue and where it needs to be addressed? Is there anything I need to do beyond servlet code?
Thanks
What database are using? What character set is database using? Characters can be malformed in the database rather than in Java code.

stringByAddingPercentEscapesUsingEncoding adds unexpected characters?

I'm having a hard time getting my NSURL to work, when I create the final string before converting to URL it adds unwanted character to the end of the string, why is this happening and how can I fix it?
Here is my code:
NSString *remotepathstring = [[NSString alloc] init];
remotepathstring=newdata.remotepath;
NSLog(#"remotepathstring = %#",remotepathstring);
NSString *remotepathstringwithescapes = [remotepathstring stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
NSLog(#"remotepathstring = %#",remotepathstringwithescapes);
remotepathURL =[NSURL URLWithString:remotepathstringwithescapes];
NSLog(#"RemotePathUrl=%#",remotepathURL);
Log outputs as follows:
"remotepathstring = http://nalahandthepinktiger.com/wp-content/uploads/nalah-sheet-5.pdf‎"
"remotepathstring = http://nalahandthepinktiger.com/wp-content/uploads/nalah-sheet-5.pdf%E2%80%8E"
"RemotePathUrl=http://nalahandthepinktiger.com/wp-content/uploads/nalah-sheet-5.pdf%E2%80%8E"
The sequence %E2%80%8E is a Unicode LEFT-TO-RIGHT MARK. This is present in your original remotepathstring, but invisible when printed out via NSLog.
The question becomes: how does newdata.remotepath get populated in the first place? Somewhere along the line it sounds like you need to perform some extra cleanup of input strings to strip out such a character.
Unrelated to the core question, it would seem you're a newcomer to Objective-C. This code is redundant and wasteful:
NSString *remotepathstring = [[NSString alloc] init];
remotepathstring=newdata.remotepath;
You create a string, only to immediately throw it away and replace it with another. If you're not using ARC, this has the additional problem of leaking! Instead do:
NSString *remotepathstring = newdata.remotepath;

How to encode the plus (+) symbol in a URL

The URL link below will open a new Google mail window. The problem I have is that Google replaces all the plus (+) signs in the email body with blank space. It looks like it only happens with the + sign. How can I remedy this? (I am working on a ASP.NET web page.)
https://mail.google.com/mail?view=cm&tf=0&to=someemail#somedomain.com&su=some subject&body=Hi there+Hello there
(In the body email, "Hi there+Hello there" will show up as "Hi there Hello there")
The + character has a special meaning in [the query segment of] a URL => it means whitespace: . If you want to use the literal + sign there, you need to URL encode it to %2b:
body=Hi+there%2bHello+there
Here's an example of how you could properly generate URLs in .NET:
var uriBuilder = new UriBuilder("https://mail.google.com/mail");
var values = HttpUtility.ParseQueryString(string.Empty);
values["view"] = "cm";
values["tf"] = "0";
values["to"] = "someemail#somedomain.com";
values["su"] = "some subject";
values["body"] = "Hi there+Hello there";
uriBuilder.Query = values.ToString();
Console.WriteLine(uriBuilder.ToString());
The result:
https://mail.google.com:443/mail?view=cm&tf=0&to=someemail%40somedomain.com&su=some+subject&body=Hi+there%2bHello+there
If you want a plus + symbol in the body you have to encode it as 2B.
For example:
Try this
In order to encode a + value using JavaScript, you can use the encodeURIComponent function.
Example:
var url = "+11";
var encoded_url = encodeURIComponent(url);
console.log(encoded_url)
It's safer to always percent-encode all characters except those defined as "unreserved" in RFC-3986.
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
So, percent-encode the plus character and other special characters.
The problem that you are having with pluses is because, according to RFC-1866 (HTML 2.0 specification), paragraph 8.2.1. subparagraph 1., "The form field names and values are escaped: space characters are replaced by `+', and then reserved characters are escaped"). This way of encoding form data is also given in later HTML specifications, look for relevant paragraphs about application/x-www-form-urlencoded.
Just to add this to the list:
Uri.EscapeUriString("Hi there+Hello there") // Hi%20there+Hello%20there
Uri.EscapeDataString("Hi there+Hello there") // Hi%20there%2BHello%20there
See https://stackoverflow.com/a/34189188/98491
Usually you want to use EscapeDataString which does it right.
Generally if you use .NET API's - new Uri("someproto:with+plus").LocalPath or AbsolutePath will keep plus character in URL. (Same "someproto:with+plus" string)
but Uri.EscapeDataString("with+plus") will escape plus character and will produce "with%2Bplus".
Just to be consistent I would recommend to always escape plus character to "%2B" and use it everywhere - then no need to guess who thinks and what about your plus character.
I'm not sure why from escaped character '+' decoding would produce space character ' ' - but apparently it's the issue with some of components.

Resources