"<" character in JSON data is serialized to \u003c

"<" character in JSON data is serialized to \u003c - asp.net

I have a JSON object where the value of one element is a string. In this string there are the characters "<RPC>". I take this entire JSON object and in my ASP.NET server code, I perform the following to take the object named rpc_response and add it to the data in a POST response:
var serializer = new System.Web.Script.Serialization.JavaScriptSerializer();
HttpContext.Current.Response.AddHeader("Pragma", "no-cache");
HttpContext.Current.Response.AddHeader("Cache-Control", "private, no-cache");
HttpContext.Current.Response.AddHeader("Content-Disposition", "inline; filename=\"files.json\"");
HttpContext.Current.Response.Write(serializer.Serialize(rpc_response));
HttpContext.Current.Response.ContentType = "application/json";
HttpContext.Current.Response.StatusCode = 200;
After the object is serialized, I receive it on the other end (not a web browser), and that particular string looks like: \u003cRPC\u003e.
What can I do to prevent these (and other) characters from not being encoded properly, still being able to serialize my JSON object?

The characters are being encoded "properly"!1 Use a working JSON library to correctly access the JSON data - it is a valid JSON encoding.
Escaping these characters prevents HTML injection via JSON - and makes the JSON XML-friendly. That is, even if the JSON is emited directly into JavaScript (as is done fairly often as JSON is a valid2 subset of JavaScript), it cannot be used to terminate the <script> element early because the relevant characters (e.g. <, >) are encoded within JSON itself.
The standard JavaScriptSerializer does not have the ability to change this behavior. Such escaping might be configurable (or different) in the Json.NET implementation - but, it shouldn't matter because a valid JSON client/library must understand the \u escapes.
1 From RFC 4627: The application/json Media Type for JavaScript Object Notation (JSON),
Any character may be escaped. If the character is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the character's code point ..
See also C# To transform Facebook Response to proper encoded string (which is also related to the JSON escaping).
2 There is a rare case when this does not hold, but ignoring (or accounting for) that..

Related

How does Servlet HttpServletResponse::setCharacterEncoding() work?

I have learned that in general, Java uses UTF-16 as the internal String representation.
My question is what actually happens when composing a response in Java and applying different char encoding, e.g. response.setCharacterEncoding("ISO-8859-1").
Does it actually convert the response's body bytes from UTF-16 to ISO-8859-1 or it just adds some metadata to the response object?

I'm assuming you're talking about a class that works along the lines of HttpServletResponse. If that's the case, then yes, it changes the body of the response, if you call getWriter. The writer that is returned by that has to convert any strings that are written to it into bytes, and the encoding is used for that.
If you've set the content type, then setting the content encoding will also make that information available via the Content-Type header. As per the ServletResponse docs:
Calling setContentType(java.lang.String) with the String of text/html and calling this method with the String of UTF-8 is equivalent with calling setContentType with the String of text/html; charset=UTF-8.

Servlet stripping parameter values because of # character

My URL is http://175.24.2.166/download?a=TOP#0;ONE=1;TWO2.
How should I encode the parameter so that when I print the parameter in the Servlet, I get the value in its entirety? Currently when I print the value by using request.getParameter("a") I get the output as TOP instead of TOP#0;ONE=1;TWO2.

You should encode it like this http://175.24.2.166/download?a=TOP%230%3BONE%3D1%3BTWO2 . There are a lot of the encoders in Java, you can try to use URLEncoder or some online encoders for experements

This is known as the "fragment identifier".
as mentioned in wiki
The fragment identifier introduced by a hash mark # is the optional last part of a URL for a document. It is typically used to identify a portion of that document.
the part after the # is info for the client. Put everything your client needs here.
you need to encode your query string.
you can use encodeURIComponent() function in JavaScript encodes a URI component.This function encodes special characters.

.Net Uri Encoding RFC 2396 vs RFC 3986

First, some quick background... As part of an integration with a third party vendor, I have a C# .Net web application that receives a URL with a bunch of information in the query string. That URL is signed with an MD5 hash and a shared secret key. Basically, I pull in the query string, remove their hash, perform my own hash on the remaining query string, and make sure mine matches the one that was supplied.
I'm retrieving the Uri in the following way...
Uri uriFromVendor = new Uri(Request.Url.ToString());
string queryFromVendor = uriFromVendor.Query.Substring(1); //Substring to remove question mark
My issue is stemming from query strings that contain special characters like an umlaut (ü). The vendor is calculating their hash based on the RFC 2396 representation which is %FC. My C# .Net app is calculating it's hash based on the RFC 3986 representation which is %C3%BC. Needless to say, our hashes don't match, and I throw my errors.
Strangely, the documentation for the Uri class in .Net says that it should follow RFC 2396 unless otherwise set to RFC 3986, but I don't have the entry in my web.config file that they say is required for this behavior.
How can I force the Uri constructor to use the RFC 2396 convention?
Failing that, is there an easy way to convert the RFC 3986 octet pairs to RFC 2396 octets?

Nothing to do with your question, but why are you creating a new Uri here? You can just do string queryFromVendor = Request.Url.Query.Substring(1); – atticae
+1 for atticae! I went back to try removing the extraneous Uri I was creating and suddenly, the string had the umlaut encoded as UTF-8 instead of UTF-16.
At first, I didn't think this would work. Somewhere along the line, I had tried retrieving the url using Request.QueryString, but this was causing the umlaut to come through as %ufffd which is the � character. In the interest of taking a fresh perspective, I tried atticae's suggestion and it worked.
I'm pretty sure the answer has to do with something I read here.
C# uses UTF-16 in all its strings, with tools to encode when it comes to dealing with streams and files that bring us onto...
ASP.NET uses UTF-8 by default, and it's hard to think of a time when it isn't a good choice...
My problems stemmed from here...
Uri uriFromVendor = new Uri(Request.Url.ToString());
By taking the Request.Url uri and creating another uri, it was encoding as the C# standard UTF-16. By using the original uri, it remained in the .Net standard UTF-8.
Thanks to all for your help.

I'm wondering if this is a bit of a red herring:
I say this because FC is the UTF16 representation of the u with umlaut; C2BC is the UTF8 representation.
I wonder if one of the System.Text.Encoding methods to convert the source data into a normal .Net string might help.
This question might be of interest too: Encode and Decode rfc2396 URLs

I don't know about the standard encoding for Uri constructors, but if everything else fails you could always decode the URL yourself and encode it in whatever encoding you like.
The HttpUtility-Class has an UrlDecode() and UrlEncode() method, which lets you specify the System.Text.Encoding as second parameter.
For example:
string decodedQueryString = HttpUtility.UrlDecode(Request.Url.Query.Substring(1));
string encodedQueryString = HttpUtility.UrlEncode(decodedQueryString, System.Text.Encoding.GetEncoding("utf-16"));
// calc hash here

ASP.NET Base64 string corruption

I am passing an object from one asp.net page to another. I'm encoding the object as a Base64 string and passing it as a POST parameter. However, when the receiving page reads the POST value, if there is a + sign in the Base64 string, it is being replaced with a line break. For example:
...AABDEDS+DFEAED...
becomes
...AABDEDS
DFEAED...
I compared the Base64 string immediately after encoding in the sending page to the string immediately before decoding in the receiving page and that is the only difference. I tried HtmlEncoding() the base64 string prior to writing it to the request stream, but that had no effect, so it seems to be an issue on the receiving end.
Any ideas?

Use UrlEncode. The + is a reserved character and needs to be encoded.

When you pass the base64 string in the parameter, you need to URL Encode it (so the characters come across properly). Use:
System.Web.HttpServerUtility.UrlEncode(base64String);
HttpServer.UrlEncode Method (String)(System.Web)

the + symbol is a special URL character that on it's own evaluates to a space in the URL.
You'll need to Server.URLEncode your base64 string on one side (which will turn the Plus into a %2B and Server.URLDecode it on the other side

How to stop .NET from encoding an XML string with XML.Serialization

I am working with some Xml Serialization in ASP.NET 2.0 in a web service. The issue is that I have an element which is defined such as this:
<System.Xml.Serialization.XmlElementAttribute(Form:=System.Xml.Schema.XmlSchemaForm.Unqualified, IsNullable:=True)> _
Public Property COMMENTFIELD() As String
Get
Return CommentField ' This is a string
End Get
Set(ByVal value as String)
CommentField = value
End Set
End Property
Elsewhere in code I am constructing a comment and appending
as a line-break (according to the rules of the web service we are submitting to) between each 'comment', like this: (Please keep in mind that
is a valid XML entity representing character 10 (Line Feed I believe).
XmlObject.COMMENTFIELD = sComment1 & "
" & sComment2
The issue is that .NET tries to do us a favor and encode the & in the comment string which ends up sending the destination web service this: &#xA;, which obviously isn't what we want.
Here is what currently happens:
XmlObject.COMMENTFIELD = sComment1 & "
" & sComment2
Output:
<COMMENTFIELD>comment1 &#xA comment2</COMMENTFIELD>
Output I NEED:
<COMMENTFIELD>comment1
comment2</COMMENTFIELD>
The Question Is: How do I force the .NET runtime to not try and do me any favors in regards to encoding data that I already know is XML compliant and escaped already (btw sComment1 and sComment2 would already be escaped). I'm used to taking care of my XML, not depending on something magical that happens to escape all my data behind my back!
I need to be able to pass valid XML into the COMMENTFIELD property without .NET encoding the data I give it (as it is already XML). I need to know how to tell .NET that the data it is receiving is an XML String, not a normal string that needs escaped.

If you look at the XML spec section 2.4, you see that the & character in an element's text always used to indicate something escaped, so if you want to send an & character, it needs to be escaped, e.g., as & So .Net is converting the literal string you gave it into valid XML.
If you really want the web service to receive the literal text &, then .NET is doing the correct thing. When the web service processes the XML it will convert it back to the same literal string you supplied on your end.
On the other hand, if you want to send the remote web service a string with a newline, you should just construct the string with the newline:
XmlObject.COMMENTFIELD = sComment1 & "\n" & sComment2
.Net will do the correct thing to make sure this is passed correctly on the wire.

It is probably dangerous to mix two different encoding conventions within the same string. Since you have your own convention I recommend explicitly encoding the whole string when it is ready to send and explicitly decoding it on the receiving end.
Try the HttpServerUtility.HtmlEncode Method (System.Web) .
+tom

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

"<" character in JSON data is serialized to \u003c - asp.net

Related

How does Servlet HttpServletResponse::setCharacterEncoding() work?

Servlet stripping parameter values because of # character

.Net Uri Encoding RFC 2396 vs RFC 3986

ASP.NET Base64 string corruption

How to stop .NET from encoding an XML string with XML.Serialization

Categories

Resources