How to stop .NET from encoding an XML string with XML.Serialization - asp.net

I am working with some Xml Serialization in ASP.NET 2.0 in a web service. The issue is that I have an element which is defined such as this:
<System.Xml.Serialization.XmlElementAttribute(Form:=System.Xml.Schema.XmlSchemaForm.Unqualified, IsNullable:=True)> _
Public Property COMMENTFIELD() As String
Get
Return CommentField ' This is a string
End Get
Set(ByVal value as String)
CommentField = value
End Set
End Property
Elsewhere in code I am constructing a comment and appending
as a line-break (according to the rules of the web service we are submitting to) between each 'comment', like this: (Please keep in mind that
is a valid XML entity representing character 10 (Line Feed I believe).
XmlObject.COMMENTFIELD = sComment1 & "
" & sComment2
The issue is that .NET tries to do us a favor and encode the & in the comment string which ends up sending the destination web service this: &#xA;, which obviously isn't what we want.
Here is what currently happens:
XmlObject.COMMENTFIELD = sComment1 & "
" & sComment2
Output:
<COMMENTFIELD>comment1 &#xA comment2</COMMENTFIELD>
Output I NEED:
<COMMENTFIELD>comment1
comment2</COMMENTFIELD>
The Question Is: How do I force the .NET runtime to not try and do me any favors in regards to encoding data that I already know is XML compliant and escaped already (btw sComment1 and sComment2 would already be escaped). I'm used to taking care of my XML, not depending on something magical that happens to escape all my data behind my back!
I need to be able to pass valid XML into the COMMENTFIELD property without .NET encoding the data I give it (as it is already XML). I need to know how to tell .NET that the data it is receiving is an XML String, not a normal string that needs escaped.

If you look at the XML spec section 2.4, you see that the & character in an element's text always used to indicate something escaped, so if you want to send an & character, it needs to be escaped, e.g., as & So .Net is converting the literal string you gave it into valid XML.
If you really want the web service to receive the literal text &, then .NET is doing the correct thing. When the web service processes the XML it will convert it back to the same literal string you supplied on your end.
On the other hand, if you want to send the remote web service a string with a newline, you should just construct the string with the newline:
XmlObject.COMMENTFIELD = sComment1 & "\n" & sComment2
.Net will do the correct thing to make sure this is passed correctly on the wire.

It is probably dangerous to mix two different encoding conventions within the same string. Since you have your own convention I recommend explicitly encoding the whole string when it is ready to send and explicitly decoding it on the receiving end.
Try the HttpServerUtility.HtmlEncode Method (System.Web) .
+tom

Related

"<" character in JSON data is serialized to \u003c

I have a JSON object where the value of one element is a string. In this string there are the characters "<RPC>". I take this entire JSON object and in my ASP.NET server code, I perform the following to take the object named rpc_response and add it to the data in a POST response:
var serializer = new System.Web.Script.Serialization.JavaScriptSerializer();
HttpContext.Current.Response.AddHeader("Pragma", "no-cache");
HttpContext.Current.Response.AddHeader("Cache-Control", "private, no-cache");
HttpContext.Current.Response.AddHeader("Content-Disposition", "inline; filename=\"files.json\"");
HttpContext.Current.Response.Write(serializer.Serialize(rpc_response));
HttpContext.Current.Response.ContentType = "application/json";
HttpContext.Current.Response.StatusCode = 200;
After the object is serialized, I receive it on the other end (not a web browser), and that particular string looks like: \u003cRPC\u003e.
What can I do to prevent these (and other) characters from not being encoded properly, still being able to serialize my JSON object?
The characters are being encoded "properly"!1 Use a working JSON library to correctly access the JSON data - it is a valid JSON encoding.
Escaping these characters prevents HTML injection via JSON - and makes the JSON XML-friendly. That is, even if the JSON is emited directly into JavaScript (as is done fairly often as JSON is a valid2 subset of JavaScript), it cannot be used to terminate the <script> element early because the relevant characters (e.g. <, >) are encoded within JSON itself.
The standard JavaScriptSerializer does not have the ability to change this behavior. Such escaping might be configurable (or different) in the Json.NET implementation - but, it shouldn't matter because a valid JSON client/library must understand the \u escapes.
1 From RFC 4627: The application/json Media Type for JavaScript Object Notation (JSON),
Any character may be escaped. If the character is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the character's code point ..
See also C# To transform Facebook Response to proper encoded string (which is also related to the JSON escaping).
2 There is a rare case when this does not hold, but ignoring (or accounting for) that..

Server side call to webservice in classic ASP

I've .NET webservice, which takes a encoded html-string as a parameter, decodes the string and creates a PDF from the html. I want to make a synchronous server side call to the webservice from a classic asp webpage. It works fine if use a plain text string (with no html tags), but when I send a encoded html string the webservice it seems that the string is empty when it reaches the webservice.
The webservice is working fine when I call it from client side, with both plain text string and an encoded html string.
My code looks like this:
Private Sub SaveBookHtmlToPdf(pHtml, pShopId)
Set oXMLHTTP = CreateObject("Msxml2.ServerXMLHTTP.6.0")
Dim strEnvelope
strEnvelope = "pShopId=" & pShopId & "&pEncodedHtml=" & Server.HTMLEncode(pHtml)
Call oXMLHTTP.Open("POST", "https://mydomain.dk:4430/PdfWebservice.asmx/SaveBookToPdf", false)
Call oXMLHTTP.SetRequestHeader("Content-Type","application/x-www-form-urlencoded")
Call oXMLHTTP.Send(strEnvelope)
Set oXMLHTTP = Nothing
End Sub
It smells like some kind of security issue on the server. It's working when posting a asynchronous call from the client side, but not when it comes from server side - it seems that the encoded html string is somehow not allowed in a server side call to the webservice.
Anyone who know how to solve this tricky problem?
This looks all wrong to me:
Server.HTMLEncode(pHtml)
Its quite common for developers to get confused between HTML encoding and URL encoding even though they are quite different. You are posting data that needs to be URL encoded. Hence your code should use URLEncode instead:
strEnvelope = "pShopId=" & pShopId & "&pEncodedHtml=" & Server.URLEncode(pHtml)
Edit:
One thing that URLEncode does that may not be compatible with a URLEncoded post is it converts space to "+" instead of "%20". Hence a more robust approach might be:
strEnvelope = "pShopId=" & pShopId & "&pEncodedHtml=" & Replace(Server.URLEncode(pHtml), "+", "%20")
Another issue to watch out for is that the current value of Response.CodePage will influence how the URLEncode encodes non-ASCII characters. Typically .NET does things by default in UTF-8. Hence you will also want to make sure that your Response.CodePage is set to 65001.
Response.CodePage = 65001
strEnvelope = "pShopId=" & pShopId & "&pEncodedHtml=" & Replace(Server.URLEncode(pHtml), "+", "%20")
This may or may not help but I use a handy SOAP Class for Classic ASP which solved a few problems I was having doing it manually. Your code would be something like this:
Set cSOAP = new SOAP
cSOAP.SOAP_StartRequest "https://mydomain.dk:4430/PdfWebservice.asmx", "", "SaveBookToPdf"
cSOAP.SOAP_AddParameter "pShopId", pShopId
cSOAP.SOAP_AddParameter "pEncodedHtml", Server.HTMLEncode(pHtml)
cSOAP.SOAP_SendRequest
' result = cSOAP.SOAP_GetResult("result")
You will probably need to set your namespace for it to work ("" currently), and uncomment the 'on error resume next' lines from the class to show errors.
AnthonyWJones made the point about URL encoding and HTML encoding, and the original problem being experienced is likely a combine of the two, a race condition if you will. While is was considered answered, it partially wasn't, and hopefully this answers the cause of the effect.
So, as the message get HTMLEncoded, the html entities for the tags become such '<' = '<'.
And as you may know, in URLEncoding, &'s delimit parameters; thus the first part of this data strEnvelope = "pShopId=" & pShopId & "&pEncodedHtml=" & Server.HTMLEncode(pHtml) upto the "&pEncodedHtml" bit, is fine. But then "<HTML>..." is added as the message, with unencoded &'s...and the receiving server likely is delimiting on them and basically truncating "&pEncodedHtml=" as a null assign: "&pEncodedHtml=<HTML>... ." The delimiting would be done on all &'s found in the URL.
So, as far as the server is concerned, the data for parameter &pEncodedHtml was null, and following it were now several other parameters that were considered cruft, that it likely ignored, which just happened to actually be your message.
Hope this provides additional info on issues of its like, and how to correct.

A Minor, but annoying niggle - Why does ASP.Net set SQL Server Guids to lowercase?

I'm doing some client-side stuff with Javascript/JQuery with .Net controls which expose their GUID/UniqueIdentifier IDs on the front end to allow them to be manipulated. During debugging something is driving me crazy: The GUIDs in the db are stored in uppercase, however by the time they make it to the front end they're in lowercase.
This means I can't quickly copy and paste IDs into the browser's console to execute JS on the fly when devving/debugging. I have found a just-about-workable way of doing this but I was wondering if anyone knew why this behaviour is the case and whether there is any way of forcing GUIDs to stay uppercase.
According to MSDN docs the Guid.ToString() method will produce lowercase string.
As to why it does that - apparently RFC 4122 states it should be this way.
The hexadecimal values "a" through "f" are output as lower case characters and are case insensitive on input.
Also check this question on SO - net-guid-uppercase-string-format.
So the best thing you can do is to call ToUpper() on your GUID strings, and add extension method as showed in the other answer.
If you're using an Eval template, then I'd see if you can do this via an Extension method.
something like
public static string ToUpperString(this Guid guid, string format = "")
{
string output = guid.ToString(format);
return output.ToUpper();
}
And then in your Eval block,
myGuid.ToUpperString("B")
Or however you need it to look.
I'm on my Mac at the moment so I can't test that, but it should work if you've got the right .Net version.

updating values in web.config

I need to store an escaped html string in a key in web.config using the
KeyValueConfigurationElement.Save method built into framework 3.5. But when I try to do so,
it keeps escaping my ampersands.
Code looks like this:
strHTML = DecodeFTBInput(FTB1.Text)
FTB1.Text is a string of HTML, like this: <b><font color="#000000">Testing</font></b>
DecodeFTPInput uses the String.Replace() method to change < and > to < and >, and " to ".
Given the above string and function, let's say strHTML now contains the following:
<b><font color="#000000">Testing</font></b>
Of course I can manually edit web.config to store the correct value, but I need the authenticated admin user to be able to change the html themselves.
The problem is that when I try to save this string into its key in web.config, it escapes all ampersands
as & which ruins the string.
How can I get around this?
web.config is an XML file, so when it writes values there the .NET Framework stores strings using HTML encoding, replacing the < > & characters with <, > and &, and much more besides.
You'll need to stop your DecodeFTPInput method from HTML encoding the string if you want the HTML in the web.config file to be editable. Otherwise you'll be HTML encoding twice, which isn't the result you want!

how to handle special characters in strings in a web service?

To show this fundamental issue in .NET and the reason for this question, I have written a simple test web service with one method (EditString), and a consumer console app that calls it.
They are both standard web service/console applications created via File/New Project, etc., so I won't list the whole code - just the methods in question:
Web method:
[WebMethod]
public string EditString(string s, bool useSpecial)
{
return s + (useSpecial ? ((char)19).ToString() : "");
}
[You can see it simply returns the string s if useSpecial is false. If useSpecial is true, it returns s + char 19.]
Console app:
TestService.Service1 service = new SCTestConsumer.TestService.Service1();
string response1 = service.EditString("hello", false);
Console.WriteLine(response1);
string response2 = service.EditString("hello", true); // fails!
Console.WriteLine(response2);
[The second response fails, because the method returns hello + a special character (ascii code 19 for argument's sake).]
The error is:
There is an error in XML document (1, 287)
Inner exception: "'', hexadecimal value 0x13, is an invalid character. Line 1, position 287."
A few points worth mentioning:
The web method itself WORKS FINE when browsing directly to the ASMX file (e.g. http://localhost:2065/service1.asmx), and running the method through this (with the same parameters as in the console application) - i.e. displays XML with the string hello + char 19.
Checking the serialized XML in other ways shows the special character is being encoded properly (the SERVER SIDE seems to be ok which is GOOD)
So it seems the CLIENT SIDE has the issue - i.e. the .NET generated proxy class code doesn't handle special characters
This is part of a bigger project where objects are passed in and out of the web methods - that contain string attributes - these are what need to work properly. i.e. we're de/serializing classes.
Any suggestions for a workaround and how to implement it?
Or have I completely missed something really obvious!!?
PS. I've not had much luck with getting it to use CDATA tags (does .NET support these out of the box?).
You will need to use byte[] instead of strings.
I am thinking of some options that may help you. You can take the route using html entities instead of char(19). or as you said you may want to use CDATA.
To come up with a clean solution, you may not want to put the whole thing in CDATA. I am not sure why you think it may not be supported in .NET. Are you saying this in the context of serialization?

Resources