.Net Uri Encoding RFC 2396 vs RFC 3986 - asp.net

First, some quick background... As part of an integration with a third party vendor, I have a C# .Net web application that receives a URL with a bunch of information in the query string. That URL is signed with an MD5 hash and a shared secret key. Basically, I pull in the query string, remove their hash, perform my own hash on the remaining query string, and make sure mine matches the one that was supplied.
I'm retrieving the Uri in the following way...
Uri uriFromVendor = new Uri(Request.Url.ToString());
string queryFromVendor = uriFromVendor.Query.Substring(1); //Substring to remove question mark
My issue is stemming from query strings that contain special characters like an umlaut (ü). The vendor is calculating their hash based on the RFC 2396 representation which is %FC. My C# .Net app is calculating it's hash based on the RFC 3986 representation which is %C3%BC. Needless to say, our hashes don't match, and I throw my errors.
Strangely, the documentation for the Uri class in .Net says that it should follow RFC 2396 unless otherwise set to RFC 3986, but I don't have the entry in my web.config file that they say is required for this behavior.
How can I force the Uri constructor to use the RFC 2396 convention?
Failing that, is there an easy way to convert the RFC 3986 octet pairs to RFC 2396 octets?

Nothing to do with your question, but why are you creating a new Uri here? You can just do string queryFromVendor = Request.Url.Query.Substring(1); – atticae
+1 for atticae! I went back to try removing the extraneous Uri I was creating and suddenly, the string had the umlaut encoded as UTF-8 instead of UTF-16.
At first, I didn't think this would work. Somewhere along the line, I had tried retrieving the url using Request.QueryString, but this was causing the umlaut to come through as %ufffd which is the � character. In the interest of taking a fresh perspective, I tried atticae's suggestion and it worked.
I'm pretty sure the answer has to do with something I read here.
C# uses UTF-16 in all its strings, with tools to encode when it comes to dealing with streams and files that bring us onto...
ASP.NET uses UTF-8 by default, and it's hard to think of a time when it isn't a good choice...
My problems stemmed from here...
Uri uriFromVendor = new Uri(Request.Url.ToString());
By taking the Request.Url uri and creating another uri, it was encoding as the C# standard UTF-16. By using the original uri, it remained in the .Net standard UTF-8.
Thanks to all for your help.

I'm wondering if this is a bit of a red herring:
I say this because FC is the UTF16 representation of the u with umlaut; C2BC is the UTF8 representation.
I wonder if one of the System.Text.Encoding methods to convert the source data into a normal .Net string might help.
This question might be of interest too: Encode and Decode rfc2396 URLs

I don't know about the standard encoding for Uri constructors, but if everything else fails you could always decode the URL yourself and encode it in whatever encoding you like.
The HttpUtility-Class has an UrlDecode() and UrlEncode() method, which lets you specify the System.Text.Encoding as second parameter.
For example:
string decodedQueryString = HttpUtility.UrlDecode(Request.Url.Query.Substring(1));
string encodedQueryString = HttpUtility.UrlEncode(decodedQueryString, System.Text.Encoding.GetEncoding("utf-16"));
// calc hash here

Related

Meaning of =3D in malicious URLs

My server logs show a many attempts to access non existing sides. These are the "usual" bots scanning for known vulnerabilities. Many of the URLs contain =3D, e.g.
/?q=3Duser%2Fpassword&name%5B%23p=
/user/register/?element_parents=3Daccou=
/wp-admin/admin-post.php?swp_debug=3Dlo=
%3D is the url encoded value of = so I would expect to find %3D within the URL but not =3D. However, =3D can be found all over the logs. What is the meaning of this?
=3D is an example of a Quoted-Printable encoding for ASCII 0x3D, or the equals sign character (=).
You don't usually see this in URLs. It's not the normal encoding to use. It's a standard MIME type, an alternative to using base64. It looks like the request is expecting the app to decode the query string using Quoted-Printable, and then use the resulting path in some further redirect.

How to embed a colon in the fragment part of URL without it being encoded?

I am trying to construct the below URL:
https://console.aws.amazon.com/elasticmapreduce/home?region=us-east-1#cluster-details:j-1IGU6572KT6LB
I am not sure how to include the :j-1IGU6572KT6LB. When I include :`, it gets encoded. Trying to see if that can be avoided.
This is what I have:
UriBuilder
.fromPath("console.aws.amazon.com")
.path("elasticmapreduce")
.path("home")
.queryParam("region","us-east-1")
.fragment("cluster-details")
.port(-1)
.scheme("https")
If the ":" in fragments is encoded that appears to me to be a bug (see RFC 3986, Section 3.5 and 3.3). I recommend to open a bug report.
OTOH, if a recipient fails to handle the percent-encoded colon, that's a bug as well.

How to determine if a URI is escaped?

I am using apache commons HTTPClient to download web resources. The URI for these resources come from third parties, I do not generate them.
The commons httpclient requires a URI object to be given to the GetMethod object.
The URI constructor takes a string (for the uri) and a boolean specifying if it is escaped or not.
Currently, I am doing the following to determine if the original url I am given is already escaped...
boolean isEscaped = URIUtil.getPathQuery(originalUrl).contains("%");
m.setURI(new URI(originalUrl, isEscaped));
Is this the correct way to determine if a uri is already escaped?
Update...
according to wikipedia ( Well, according to wikipedia ( http://en.wikipedia.org/wiki/Percent-encoding ) it says that percent is a reserved character and should always be encoded... I am quoting verbatim here...
Percent-encoding the percent character[edit] Because the percent ("%")
character serves as the indicator for percent-encoded octets, it must
be percent-encoded as "%25" for that octet to be used as data within a
URI.
Doesnt this mean that you can never have a naked '%' character in a valid uri?
Also, the uri(s) come from various sources so I cannot be sure if they are escaped or unescaped.
This wouldn't work. It's possible the un-encoded string has a % in it already.
ex:
https://www.google.com/#q=like%25&safe=off
is the url for a google search for like%. In unescaped form it would be https://www.google.com/#q=like%&safe=off
Your consumers should let you know if the URI is escaped or not.

Why is HttpUtility.UrlPathEncode marked as "do not use"?

Why does the documentation of .NET for HttpUtility.UrlPathEncode for .NET 4.5 state
Do not use; intended only for browser compatibility. Use UrlEncode.
UrlEncode does not do the same, it encodes a string for the parameter part of a URL, not for the path part. Is there a better way to encode a string for the path part and why shouldn't I use this function, which is in the framework since 1.1 and works?
based on MSDN, they recommend to use UrlEncode to grantee that it is working for all platforms and browsers
You can encode a URL using with the UrlEncode method or the UrlPathEncode method. However, the methods return different results. The UrlEncode method converts each space character to a plus character (+). The UrlPathEncode method converts each space character into the string "%20", which represents a space in hexadecimal notation. Use the UrlPathEncode method when you encode the path portion of a URL in order to guarantee a consistent decoded URL, regardless of which platform or browser performs the decoding.
Also UrlEncode is using UTF-8 encoding so if you are sending query string in different language like Arabic you should use UrlEncode
Use Uri.EscapeUriString if you want encode a path of a URL. HttpUtility.UrlEncode is for query parameters and encode also slashes which is wrong for a path.

Is IIS performing an illegal character substitution? If so, how to stop it?

Context: ASP.NET MVC running in IIS, with a a UTF-8 %-encoded URL.
Using the standard project template, and a test-action in HomeController like:
public ActionResult Test(string id)
{
return Content(id, "text/plain");
}
This works fine for most %-encoded UTF-8 routes, such as:
http://mydevserver/Home/Test/%e4%ba%ac%e9%83%bd%e5%bc%81
with the expected result 京都弁
However using the route:
http://mydevserver/Home/Test/%ee%93%bb
the url is not received correctly.
Aside: %ee%93%bb is %-encoded code-point 0xE4FB; basic-multilingual-plane, private-use area; but ultimately - a valid unicode code-point; you can verify this manually, or via:
string value = ((char) 0xE4FB).ToString();
string encoded = HttpUtility.UrlEncode(value); // %ee%93%bb
Now, what happens next depends on the web-server; on the Visual Studio Development Server (aka cassini), the correct id is received - a string of length one, containing code-point 0xE4FB.
If, however, I do this in IIS or IIS Express, I get a different id, specifically "î“»", code-points: 0xEE, 0x201C, 0xBB. You will immediately recognise the first and last as the start and end of our percent-encoded string... so what happened in the middle?
Well:
code-point 0x93 is “ (source)
code-point 0x201c is “ (source)
It looks to me very much like IIS has performed some kind of quote-translation when processing my url. Now maybe this might have uses in a few scenarios (I don't know), but it is certainly a bad thing when it happens in the middle of a %-encoded UTF-8 block.
Note that HttpContext.Current.Request.Raw also shows this translation has occurred, so this does not look like an MVC bug; note also Darin's comment, highlighting that it works differently in the path vs query portion of the url.
So (two-parter):
is my analysis missing some important subtlety of unicode / url processing?
how do I fix it? (i.e. make it so that I receive the expected character)
id = Encoding.UTF8.GetString(Encoding.Default.GetBytes(id));
This will give you your original id.
IIS uses Default (ANSI) encoding for path characters. Your url encoded string is decoded using that and that is why you're getting a weird thing back.
To get the original id you can convert it back to bytes and get the string using utf8 encoding.
See Unicode and ISAPI Filters
ISAPI Filter is an ANSI API - all values you can get/set using the API
must be ANSI. Yes, I know this is shocking; after all, it is 2006 and
everything nowadays are in Unicode... but remember that this API
originated more than a decade ago when barely anything was 32bit, much
less Unicode. Also, remember that the HTTP protocol which ISAPI
directly manipulates is in ANSI and not Unicode.
EDIT: Since you mentioned that it works with most other characters so I'm assuming that IIS has some sort of encoding detection mechanism which is failing in this case. As a workaround though you can prefix your id with this char and then you can easily detect if the problem occurred (if this char is missing). Not a very ideal solution but it will work. You can then write your custom model binder and a wrapper class in ASP.NET MVC to make your consumption code cleaner.
Once Upon A Time, URLs themselves were not in UTF-8. They were in the ANSI code page. This facilitates the fact that they often are used to select, well, pathnames in the server's file system. In ancient times, IE had an option to tell whether you wanted to send UTF-8 URLs or not.
Perhaps buried in the bowels of the IIS config there is a place to specify the URL encoding, and perhaps not.
Ultimately, to get around this, I had to use request.ServerVariables["HTTP_URL"] and some manual parsing, with a bunch of error-handling fallbacks (additionally compensating for some related glitches in Uri). Not great, but only affects a tiny minority of awkward requests.

Resources