how to handle special characters in strings in a web service?

how to handle special characters in strings in a web service? - asp.net

To show this fundamental issue in .NET and the reason for this question, I have written a simple test web service with one method (EditString), and a consumer console app that calls it.
They are both standard web service/console applications created via File/New Project, etc., so I won't list the whole code - just the methods in question:
Web method:
[WebMethod]
public string EditString(string s, bool useSpecial)
{
return s + (useSpecial ? ((char)19).ToString() : "");
}
[You can see it simply returns the string s if useSpecial is false. If useSpecial is true, it returns s + char 19.]
Console app:
TestService.Service1 service = new SCTestConsumer.TestService.Service1();
string response1 = service.EditString("hello", false);
Console.WriteLine(response1);
string response2 = service.EditString("hello", true); // fails!
Console.WriteLine(response2);
[The second response fails, because the method returns hello + a special character (ascii code 19 for argument's sake).]
The error is:
There is an error in XML document (1, 287)
Inner exception: "'', hexadecimal value 0x13, is an invalid character. Line 1, position 287."
A few points worth mentioning:
The web method itself WORKS FINE when browsing directly to the ASMX file (e.g. http://localhost:2065/service1.asmx), and running the method through this (with the same parameters as in the console application) - i.e. displays XML with the string hello + char 19.
Checking the serialized XML in other ways shows the special character is being encoded properly (the SERVER SIDE seems to be ok which is GOOD)
So it seems the CLIENT SIDE has the issue - i.e. the .NET generated proxy class code doesn't handle special characters
This is part of a bigger project where objects are passed in and out of the web methods - that contain string attributes - these are what need to work properly. i.e. we're de/serializing classes.
Any suggestions for a workaround and how to implement it?
Or have I completely missed something really obvious!!?
PS. I've not had much luck with getting it to use CDATA tags (does .NET support these out of the box?).

You will need to use byte[] instead of strings.

I am thinking of some options that may help you. You can take the route using html entities instead of char(19). or as you said you may want to use CDATA.
To come up with a clean solution, you may not want to put the whole thing in CDATA. I am not sure why you think it may not be supported in .NET. Are you saying this in the context of serialization?

Related

Character encoding issue with Spring MVC and HTML form

I'm working with spring mvc. I've set up a web form that has two simple text inputs. On controller, I use #ModelAttribute to let spring build the bean from the web form.
The problem comes when user puts on those text fields specials characters, like 酒店 and this kind of stuff, spring doesn't read it as utf-8, and they become the usual bad-encoded string.
I've checked web.xml and there's the utf-8 encoding filter, all pages are marqued as utf-8 and browser is sending right charset headers. Any idea on what's going on?

You may want to check this out:
http://forum.springsource.org/showthread.php?81858-ResponseBody-and-UTF-8
The short of it is that if you are using annotated methods then the messageconverter being used has a default character set. You can change this setting in your web.xml by setting the supported media types.
However, if your service doesn't like that media type, you may get a 406 error. You can create your own string message converter and set the default encoding, but there is no easy way with the built in HttpStringMessageConverter.
Alternately you can re-encode a string to a different character set:
String newresponse = new String(responseString.getBytes("ISO-8859-1"), "UTF-8");
You may also want to check out the related question here: How to get UTF-8 working in Java webapps?

the solution is simple by add produces = "text/plain;charset=UTF-8" to request mapping you can force spring mvc to encode the returned text.

double escape sequence inside a url : The request filtering module is configured to deny a request that contains a double escape sequence

On my ASP.NET MVC application, I am trying to implement a URL like below :
/product/tags/for+families
When I try to run my application with default configurations, I am getting this message with 404.11 Response Code :
HTTP Error 404.11 - Not Found
The request filtering module is configured to deny a request that
contains a double escape sequence.
I can get around with this error by implementing the below code inside my web.config :
<system.webServer>
<security>
<requestFiltering allowDoubleEscaping="true" />
</security>
</system.webServer>
So, now I am not getting any 404.11.
What I am wondering is that what kind of security holes I am opening with this implementation.
BTW, my application is under .Net Framework 4.0 and running under IIS 7.5.

The security holes that you might open up have to do with code injection - HTML injection, JavaScript injection or SQL injection.
The default settings protect you from attacks semi-efficiently by not allowing common injection strategies to work. The more default security you remove, the more you have to think about what you do with the input provided through URLs, GET request querystrings, POST request data, HTTP headers and so on...
For instance, if you are building dynamic SQL queries based on the id parameter of your action method, like this:
public ActionResult Tags(string id)
{
var sql = "SELECT * FROM Tags Where tagName = '" + id + "'";
// DO STUFF...
}
(...which is NOT a good idea), the default protection, put in place by the .NET framework, might stop some of the more dangerous scenarios, like the user requesting this URL:
/product/tags/1%27;drop%20table%20Tags;%20--
The whole idea is to treat every part of urls and other inputs to action methods as possible threats. The default security setting does provide some of that protection for you. Each default security setting you change opens up for a little more potential badness that you need to handle manually.
I assume that you are not building SQL queries this way. But the more sneaky stuff comes when you store user input in your database, then later displaying them. The malevolent user could store JavaScript or HTML in your database that go out unencoded, which would in turn threaten other users of your system.

Security Risk
The setting allowDoubleEscaping only applies to the path (cs-uri-stem) and is best explained by OWASP Double Encoding. The technique is used to get around security controls by URL Encoding the request twice. Using your URL as an example:
/product/tags/for+families --> /product/tags/for%2Bfamilies --> /product/tags/for%252Bfamilies
Suppose there are security controls specifically for /product/tags/for+families. A request arrives for /product/tags/for%252Bfamilies which is the same resource though is unchecked by the aforementioned security controls. I used the generalized term of security controls because they could be anything such as requiring an authenticated user, checking for SQLi, etc.
Why Does IIS Block?
The plus sign (+) is a reserved character per RFC2396:
Many URI include components consisting of or delimited by, certain
special characters. These characters are called "reserved", since
their usage within the URI component is limited to their reserved
purpose. If the data for a URI component would conflict with the
reserved purpose, then the conflicting data must be escaped before
forming the URI.
reserved = ";" | "/" | "?" | ":" | "#" | "&" | "=" | "+" |
"$" | ","
Wade Hilmo has an excellent post titled How IIS blocks characters in URLs. There's lots of information and background provided. The part specifically for the plus sign is as follows:
So allowDoubleEscaping/VerifyNormalization seems pretty straightforward. Why did I say that it causes confusion? The issue is when a ‘+’ character appears in a URL. The ‘+’ character doesn’t appear to be escaped, since it does not involve a ‘%’. Also, RFC 2396 notes it as a reserved character that can be included in a URL when it’s in escaped form (%2b). But with allowDoubleEscaping set to its default value of false, we will block it even in escaped form. The reason for this is historical: Way back in the early days of HTTP, a ‘+’ character was considered shorthand for a space character. Some canonicalizers, when given a URL that contains a ‘+’ will convert it to a space. For this reason, we consider a ‘+’ to be non-canonical in a URL. I was not able to find any reference to a RFC that calls out this ‘+’ treatment, but there are many references on the web that talk about it as a historical behavior.
From my own experience I know that when IIS logs a request spaces are substituted with a plus sign. Having a plus sign in the name may cause confusion when parsing logs.
Solution
There are three ways to fix this and two ways to still use the plus sign.
allowDoubleEscaping=true - This will allow double escaping for your entire website/application. Depending on the content, this could be undesirable to say the least. The following command will set allowDoubleEscaping=true.
appcmd.exe set config "Default Web Site" -section:system.webServer/security/requestFiltering /allowDoubleEscaping:True
alwaysAllowedUrls - Request Filtering offers a whitelist approach. By adding that URL path to alwaysAllowedUrls, the request will not be checked by any other Request Filtering settings and continue on in the IIS request pipeline. The concern here is that Request Filtering will not check the request for:
Request Limits: maxContentLength, maxUrl, maxQueryString
Verbs
Query - query string parameters will not be checked
Double Escaping
High Bit Characters
Request Filtering Rules
Request Header Limits
The following command will add /product/tags/for+families to alwaysAllowedUrls on the Default Web Site.
appcmd.exe set config "Default Web Site" -section:system.webServer/security/requestFiltering /+"alwaysAllowedUrls.[url='/product/tags/for+families']"
Rename - yes, just rename the file/folder/controller/etc. if possible. This is the easiest solution.

I have made a work arround for this .
so when you want to put the encripted string inside the url for(IIS)
you have to clean it from dirty :{ ";", "/", "?", ":", "#", "&", "=", "+", "$", "," };
and when you want to decript it and use it again , you have to make it dirty again before decript it (to get the wanted result) .
Here is my code , i hope it helps somebody :
public static string cleanUpEncription(string encriptedstring)
{
string[] dirtyCharacters = { ";", "/", "?", ":", "#", "&", "=", "+", "$", "," };
string[] cleanCharacters = { "p2n3t4G5l6m","s1l2a3s4h","q1e2st3i4o5n" ,"T22p14nt2s", "a9t" , "a2n3nd","e1q2ua88l","p22l33u1ws","d0l1ar5","c0m8a1a"};
foreach (string dirtyCharacter in dirtyCharacters)
{
encriptedstring=encriptedstring.Replace(dirtyCharacter, cleanCharacters[Array.IndexOf(dirtyCharacters, dirtyCharacter)]);
}
return encriptedstring;
}
public static string MakeItDirtyAgain(string encriptedString)
{
string[] dirtyCharacters = { ";", "/", "?", ":", "#", "&", "=", "+", "$", "," };
string[] cleanCharacters = { "p2n3t4G5l6m", "s1l2a3s4h", "q1e2st3i4o5n", "T22p14nt2s", "a9t", "a2n3nd", "e1q2ua88l", "p22l33u1ws", "d0l1ar5", "c0m8a1a" };
foreach (string symbol in cleanCharacters)
{
encriptedString = encriptedString.Replace(symbol, dirtyCharacters[Array.IndexOf(cleanCharacters,symbol)]);
}
return encriptedString;
}

Encode the encrypted string separated:
return HttpServerUtility.UrlTokenEncode(Encoding.UTF8.GetBytes("string"));
Decode the encrypted string separated:
string x = Encoding.UTF8.GetString(HttpServerUtility.UrlTokenDecode(id));

So I ran into this when I was calling an API from an MVC app. Instead of opening the security hole, I modified my path.
First off, I recommend NOT disabling this setting. It is more appropriate to modify the design of the application/resource (e.g. encode the path, pass the data in a header or in the body).
Although this is an older post, I thought I would share how you could resolve this error if you are receiving this from a call to an API by using HttpUtility.UrlPathEncode method in System.Web.
I use RestSharp for making calls out, so my example is using the RestRequest:
var tags = new[] { "for", "family" };
var apiRequest = new RestRequest($"product/tags/{HttpUtility.UrlPathEncode(string.Join("+", tags))}");
This produces a path equal to:
/product/tags/for%2Bfamilies
On another note, do NOT build a dynamic query based on a user's inputs. You SHOULD always use a SqlParameter. Also, it is extremely important from a security perspective to return the values with the appropriate encoding to prevent injection attacks.

Is IIS performing an illegal character substitution? If so, how to stop it?

Context: ASP.NET MVC running in IIS, with a a UTF-8 %-encoded URL.
Using the standard project template, and a test-action in HomeController like:
public ActionResult Test(string id)
{
return Content(id, "text/plain");
}
This works fine for most %-encoded UTF-8 routes, such as:
http://mydevserver/Home/Test/%e4%ba%ac%e9%83%bd%e5%bc%81
with the expected result 京都弁
However using the route:
http://mydevserver/Home/Test/%ee%93%bb
the url is not received correctly.
Aside: %ee%93%bb is %-encoded code-point 0xE4FB; basic-multilingual-plane, private-use area; but ultimately - a valid unicode code-point; you can verify this manually, or via:
string value = ((char) 0xE4FB).ToString();
string encoded = HttpUtility.UrlEncode(value); // %ee%93%bb
Now, what happens next depends on the web-server; on the Visual Studio Development Server (aka cassini), the correct id is received - a string of length one, containing code-point 0xE4FB.
If, however, I do this in IIS or IIS Express, I get a different id, specifically "î“»", code-points: 0xEE, 0x201C, 0xBB. You will immediately recognise the first and last as the start and end of our percent-encoded string... so what happened in the middle?
Well:
code-point 0x93 is “ (source)
code-point 0x201c is “ (source)
It looks to me very much like IIS has performed some kind of quote-translation when processing my url. Now maybe this might have uses in a few scenarios (I don't know), but it is certainly a bad thing when it happens in the middle of a %-encoded UTF-8 block.
Note that HttpContext.Current.Request.Raw also shows this translation has occurred, so this does not look like an MVC bug; note also Darin's comment, highlighting that it works differently in the path vs query portion of the url.
So (two-parter):
is my analysis missing some important subtlety of unicode / url processing?
how do I fix it? (i.e. make it so that I receive the expected character)

id = Encoding.UTF8.GetString(Encoding.Default.GetBytes(id));
This will give you your original id.
IIS uses Default (ANSI) encoding for path characters. Your url encoded string is decoded using that and that is why you're getting a weird thing back.
To get the original id you can convert it back to bytes and get the string using utf8 encoding.
See Unicode and ISAPI Filters
ISAPI Filter is an ANSI API - all values you can get/set using the API
must be ANSI. Yes, I know this is shocking; after all, it is 2006 and
everything nowadays are in Unicode... but remember that this API
originated more than a decade ago when barely anything was 32bit, much
less Unicode. Also, remember that the HTTP protocol which ISAPI
directly manipulates is in ANSI and not Unicode.
EDIT: Since you mentioned that it works with most other characters so I'm assuming that IIS has some sort of encoding detection mechanism which is failing in this case. As a workaround though you can prefix your id with this char and then you can easily detect if the problem occurred (if this char is missing). Not a very ideal solution but it will work. You can then write your custom model binder and a wrapper class in ASP.NET MVC to make your consumption code cleaner.

Once Upon A Time, URLs themselves were not in UTF-8. They were in the ANSI code page. This facilitates the fact that they often are used to select, well, pathnames in the server's file system. In ancient times, IE had an option to tell whether you wanted to send UTF-8 URLs or not.
Perhaps buried in the bowels of the IIS config there is a place to specify the URL encoding, and perhaps not.

Ultimately, to get around this, I had to use request.ServerVariables["HTTP_URL"] and some manual parsing, with a bunch of error-handling fallbacks (additionally compensating for some related glitches in Uri). Not great, but only affects a tiny minority of awkward requests.

A Minor, but annoying niggle - Why does ASP.Net set SQL Server Guids to lowercase?

I'm doing some client-side stuff with Javascript/JQuery with .Net controls which expose their GUID/UniqueIdentifier IDs on the front end to allow them to be manipulated. During debugging something is driving me crazy: The GUIDs in the db are stored in uppercase, however by the time they make it to the front end they're in lowercase.
This means I can't quickly copy and paste IDs into the browser's console to execute JS on the fly when devving/debugging. I have found a just-about-workable way of doing this but I was wondering if anyone knew why this behaviour is the case and whether there is any way of forcing GUIDs to stay uppercase.

According to MSDN docs the Guid.ToString() method will produce lowercase string.
As to why it does that - apparently RFC 4122 states it should be this way.
The hexadecimal values "a" through "f" are output as lower case characters and are case insensitive on input.
Also check this question on SO - net-guid-uppercase-string-format.
So the best thing you can do is to call ToUpper() on your GUID strings, and add extension method as showed in the other answer.

If you're using an Eval template, then I'd see if you can do this via an Extension method.
something like
public static string ToUpperString(this Guid guid, string format = "")
{
string output = guid.ToString(format);
return output.ToUpper();
}
And then in your Eval block,
myGuid.ToUpperString("B")
Or however you need it to look.
I'm on my Mac at the moment so I can't test that, but it should work if you've got the right .Net version.

How to stop .NET from encoding an XML string with XML.Serialization

I am working with some Xml Serialization in ASP.NET 2.0 in a web service. The issue is that I have an element which is defined such as this:
<System.Xml.Serialization.XmlElementAttribute(Form:=System.Xml.Schema.XmlSchemaForm.Unqualified, IsNullable:=True)> _
Public Property COMMENTFIELD() As String
Get
Return CommentField ' This is a string
End Get
Set(ByVal value as String)
CommentField = value
End Set
End Property
Elsewhere in code I am constructing a comment and appending
as a line-break (according to the rules of the web service we are submitting to) between each 'comment', like this: (Please keep in mind that
is a valid XML entity representing character 10 (Line Feed I believe).
XmlObject.COMMENTFIELD = sComment1 & "
" & sComment2
The issue is that .NET tries to do us a favor and encode the & in the comment string which ends up sending the destination web service this: &#xA;, which obviously isn't what we want.
Here is what currently happens:
XmlObject.COMMENTFIELD = sComment1 & "
" & sComment2
Output:
<COMMENTFIELD>comment1 &#xA comment2</COMMENTFIELD>
Output I NEED:
<COMMENTFIELD>comment1
comment2</COMMENTFIELD>
The Question Is: How do I force the .NET runtime to not try and do me any favors in regards to encoding data that I already know is XML compliant and escaped already (btw sComment1 and sComment2 would already be escaped). I'm used to taking care of my XML, not depending on something magical that happens to escape all my data behind my back!
I need to be able to pass valid XML into the COMMENTFIELD property without .NET encoding the data I give it (as it is already XML). I need to know how to tell .NET that the data it is receiving is an XML String, not a normal string that needs escaped.

If you look at the XML spec section 2.4, you see that the & character in an element's text always used to indicate something escaped, so if you want to send an & character, it needs to be escaped, e.g., as & So .Net is converting the literal string you gave it into valid XML.
If you really want the web service to receive the literal text &, then .NET is doing the correct thing. When the web service processes the XML it will convert it back to the same literal string you supplied on your end.
On the other hand, if you want to send the remote web service a string with a newline, you should just construct the string with the newline:
XmlObject.COMMENTFIELD = sComment1 & "\n" & sComment2
.Net will do the correct thing to make sure this is passed correctly on the wire.

It is probably dangerous to mix two different encoding conventions within the same string. Since you have your own convention I recommend explicitly encoding the whole string when it is ready to send and explicitly decoding it on the receiving end.
Try the HttpServerUtility.HtmlEncode Method (System.Web) .
+tom

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex