Navigating to decoded URL doesn't elicit the same action as navigating to the encoded URL - asp.net

Trying to understand why pasting the first link works but not the second one.
Breakdown of the URL, for a clearer view:
Encoded version: [works]
http%3A%2F%2FsomeSite.com
%2FDownload.ashx
%3Frequest
%3DIL7zxW6ETqiYU6cThSNKL8MpY
%252bCRIVFZAVhd8DYPG85C1Uhdd
%252f2hqqmoObeNmuS3dg4bDgGBb0kUUxGZhej89kTaLBHBXS
%252bq3tlaEk2uMEcbWlUZzZQs00sirwZ2IvAvoSpU7HC3N1FaYSNciQ4iHNNmTU
%252f6uMypNlPOJ6enlbZ1OrrYODkaMRdRfGKEba
%252brusdryM4gp
%252bopi1a0gNuMQVCtj
%252bAvDcgXGOcZPNhPAnE
%253d&version=Ma88r6Z6t2JQcnVhVXgp0A%3D%3D
Decoded version: [doesn't work]
http://someSite.com
/Download.ashx
?request=
IL7zxW6ETqiYU6cThSNKL8MpY
+CRIVFZAVhd8DYPG85C1Uhdd
/2hqqmoObeNmuS3dg4bDgGBb0kUUxGZhej89kTaLBHBXS
+q3tlaEk2uMEcbWlUZzZQs00sirwZ2IvAvoSpU7HC3N1FaYSNciQ4iHNNmTU
/6uMypNlPOJ6enlbZ1OrrYODkaMRdRfGKEba
+rusdryM4gp
+opi1a0gNuMQVCtj
+AvDcgXGOcZPNhPAnE
=&version=Ma88r6Z6t2JQcnVhVXgp0A==
If I paste the first link in the browser - it works. A file download automatically starts.
If I paste the second link in the browser - page says Bad request.
Can anyone clarify it for me why the second one doesn't work?

Quoting the URLencodetag:
To “URL encode” or “percent encode” text means to encode it for use in a URL. Some characters are not valid when used as-is in URLs, and so much be URL-encoded (percent-encoded) when appearing in URLs.
The encoding was used for a reason, here because the base64 values for the request and version parameters contains +, / and = which have their own meaning in URLs and therefore need to be URL-encoded.

Related

ASP.NET Core URL Parameter Decoding

I have an ASP.NET Core web API and an issue with encoded URL's in query parameters.
I have an URL parameter like 'path/to/'. The IDENTIFIER part is something like 'HÄÄ/20/19'. This is urlEncoded in frontend to a link URL. The result is a link like
domain.com/new/stuff/path/to/H%C3%84%C3%84%2F20%2F19
Now, at some point, user gets redirected to a controller where this URL is used in a query parameter like:
param=%2Fpath%2Fto%2FH%C3%84%C3%84%2F20%2F19
I'm using request query to get the param
var param = HttpContext.Request.Query["param"].ToString();
After this the value of param is
%2Fpath%2Fto%2FHÄÄ%2F20%2F19
So the LATIN CAPITAL LETTER A WITH DIAERESIS are automatically decoded as the other encoded characters are not.
The actual problem comes when I'm redirecting the user to this URL. It ends up with a referer header where it causes havoc with an error message
System.InvalidOperationException: Invalid non-ASCII or control character in header: 0x00C4
I tried to just replace all the 'Ä' characters with 'A' and the problem is fixed. This is not a real fix though. I cannot encode the whole variable (see above) as it would result in double encoding for other encoded characters.
This problem only occurs with IE11 and Edge (AFAIK) and works fine with at least Chrome.
I'm not 100% sure where the actual problem is and why this is happening so does anyone have any ideas where to start looking and how to fix this without hacking with the string.replace?
EDIT
I could fix it with something like this, but I'm not seriously doing this. Seems way too hacky.
var problemPart = param.Substring(param.LastIndexOf('/') + 1, param.Length - param.LastIndexOf('/') - 1);
var fixedPart = WebUtility.UrlDecode(problemPart);
fixedPart = WebUtility.UrlEncode(fixedPart);
param = param.Replace(problemPart, fixedPart);
EDIT 2
I think the problem is that IE11 and Edge change the encoding by adding control characters to it when the URL ends up to the referer header. The fix I added to the original post doesn't actually fix the problem but just work around it. The control character that gets added to the URL is %C2%84 (so Ä becomes %C3%84%C2%84 instead of just %C3%84)
TEMPORARY WORKAROUND
I basically used the code above to workaround the issue. I iterated the parameter value and re-encoded all the invalid characters in it. This doesn't fix the root cause but works around the issue and user doesn't get any errors to the screen.

ASP.NET Form Action Invalid Percent Encoding

I have a web application that places the user's search term in the query string, in a similar way to Google. E.g. the address might be www.example.com/mysearchpage.aspx?q=searchTerm.
Usually this works fine, but if there is a special character in the search term such as â, the action attribute on the form is encoded to percent encoding and the character is replaced with %u00e2.
If I search for chât I will end up with the URL www.example.com/mysearchpage.aspx?q=châtin the browser's address bar but the action attribute on the form that comes back from the server would be www.example.com/mysearchpage.aspx?q=ch%u00e2t which means that a subsequent form submission fails because the URL is incorrectly formatted.
I have ensured that in IIS the encoding is set to be UTF-8 for Requests, Response Headers and Responses. I have also inspected the page being delivered from IIS in Fiddler and that already includes the incorrectly encoded action.
The encoded format appears to be in a non-standard format as explained in this wikipedia article.
Is there a way to prevent IIS from encoding the form's action in this way?
The solution was to add targetFramework=4.5.2 into the httpRuntime tag in the web.config file.
Previously this was not specified but was specified in the compilation tag, however specifying targetFramework=4.5.1 still caused the problem.

Response.Redirect Ampersand Encoding

Failing to escape an "&" character in HTML markup creates an entity. It is often done inadvertently when linking URLs in a document, and W3C's Markup Validation Service will consider this an error.
I'm wondering, does ASP.NET's Response.Redirect method expect ampersands to be escaped in its url parameter? From reading its MSDN description, I honestly can't tell.
Pass the URL exactly as it should appear in the address bar in the web browser. For example, if you're trying to redirect to http://example.com/?foo=bar&baz=quux, then pass that exact string as-is to Response.Redirect.
try UrlEncode The UrlEncode(String) method can be used to encode the entire URL, including query-string values. If characters such as blanks and punctuation are passed in an HTTP stream without encoding, they might be misinterpreted at the receiving end. URL encoding converts characters that are not allowed in a URL into character-entity equivalents; URL decoding reverses the encoding. For example, when the characters < and > are embedded in a block of text to be transmitted in a URL, they are encoded as %3c and %3e. URLEncode
System.Web.HttpUtility.UrlEncode(string url)

URL just removes parameters

I've created an API for use on my website.
The API I made strips everything using mysql_real_escape_string then puts it into the database.
But the problem I'm having is the URL that my php scripts are using to access the API is cut short sometimes...
Which I have narrowed down to one of the parameters...
When its Ford Mondeo 22' the URL that is passed to simplexml_load_file is
http://mydomain.com/api/create.xml?api_number=brho15p6z1dhqwf5tsff&env=live&number=AJ20023232&title=Ford Mondeo 22'&image=http://mydomain.com/wp-content/uploads/2012/10/914955-150x150.jpg
but the API reports back the URL accessed as
http://mydomain.com/api/create.xml?api_number=brho15p6z1dhqwf5tsff&env=live&number=AJ20023232&title=Ford
If I remove the single quote then everything works fine, any idea how to correct this I suspect there's something I've overlooked when passing variables in the URL
It is the spaces in the "Ford Mondeo 22'" value that is causing the problem. You cannot have a spaces in the URL. You need to use escape characters. The encoded version of the parameter should be
Ford%20Mondeo%2022'
%20 is the escape character for space
I.e. the whole URL should read as follows:
http://mydomain.com/api/create.xml?api_number=brho15p6z1dhqwf5tsff&env=live&number=AJ20023232&title=Ford%20Mondeo%2022'&image=http://mydomain.com/wp-content/uploads/2012/10/914955-150x150.jpg
EDIT:
Your comment indicates that you use PHP. In PHP, you can use urlencode($foo) and urldecode($foo) to switch between the normal string and the encoding string.

when assigning location.href, please explain url encoding (in asp.net and firefox)

In some javascript, I have:
var url = "find.aspx?" + "location=" + encodeURIComponent( address );
alert( url );
location.href = url;
where the value of address is the string "Seattle, WA".
In the alert I see
find.aspx?Seattle%2C%20WA
as I expect.
But on the server side, when I look at Request.Url, the relevant substring I see is
find.aspx?Seattle, WA
And in the Firefox url window I see
find.aspx?location=Seattle%2C WA
So I'm getting three different representations whereas I would expect that in all three places I should see what I see in the alert. My expectation is that the url I assign to location.href should show up as-is in the browser url window, and should be passed as-is to the server in Request.Url (and I would need to decode the values on the server before using them). What's happening?
Firefox converts certain encoded characters into their literal forms as a way to be friendly to users. It will also convert spaces typed into the address bar into %20 for the server.
Update: The reason Firefox doesn't display the comma unencoded is because commas are allowed in URLs, but spaces are not, so it knows that a space is going to be unambiguously interpreted, whereas the pre-encoded comma is different from a non-encoded comma to some servers. see: Can I use commas in a URL?
ASP is probably trying to help you out by auto-un-encoding the string for you.
Update: It looks like ASP.NET unencodes Request.Url for you by default, as mentioned here: QueryString malformed after URLDecode They also mention that you can use HttpRequest.Url.Query to access the un-decoded version.
The alert is the only thing not doing any "magic" for you.
For the alert, you are doing the encoding yourself. Perhaps it looks the same as on the server-side if you removed encodeURIComponent.
On the server side, ASP.NET will always show you the unencoded form. This is to make it easier to directly map to files that also have text that needed to be (un)encoded.
Note that you can replace every letter for its UTF8 representation in URL Encoding. It will still be the same URL. I.e., type the following in the browser window and it will still work: %66%59%6E%64.aspx?location=Seattle%2C%20WA. To only encode the necessary chars, use UrlEncode on the server side if you create a link yourself.
URL encoding can become fairly tricky. You ask to explain it. To know the correct escape of a certain character, you need to know how that character looks in UTF8. The hexadecimal value of the UTF-8 bytes then become the %XX%YY value of your letter. Sometimes it's one %XX, but it can be up to six byte sequences in total (some Chinese characters for instance).
URL Encoding works one way only. Never double-encode or double-unencode. This is prohibited by the specification. Also, because you can encode any character, it is not always possible (as you found out) to do roundtrip encoding/unencoding. If you unencode and re-encode again, it is well possible that the resulting string is different, but syntactically the same.
In HTML, URL Encoding is sometimes interspersed with HTML Encoding. I.e., the ampersand is valid in HTML, but not in HTML. find.aspx?city=A&name=B becomes find.aspx?city=A&name=B in and HTML URL. However, browsers are lenient and will accept wrongly HTML-encoded strings.
Finally, a not on the browser: if you type in a space in a link, even inside an <a> tag, it will escape the space (or other character) for you. Likewise, it will nowadays show the odd characters (é, ï etc) in the address bar, but when it sends it over HTTP, the browser will correctly do the encoding for you.
Update: about anwering your question of needing a "definitive" reference or proof.
While I couldn't find any on the internet, I decided to look for it myself using Reflector. Going through the methods that set, for instance, the HttpRequest.QueryString, you quickly encounter the private method HttpRequest.FillInQueryStringCollection which then calls HttpValueCollection.FillfromEncodedBytes. Somewhat near the end of that method, HttpUtility.UrlDecode is called for the values. Conclusion: do not call it yourself, to prevent double decoding.
You can see this for yourself when you download Reflector and disassemble the .NET libs of System.Web.
For your example you can change this line
var url = "find.aspx?" + "location=" + encodeURIComponent( address );
to
var url = "find.aspx?" + "location=" + address;
and see the address as it is. Bu if address variable contains any '&' character your variable will be corrupt. So you are using encodeURIComponent to encode these things url.
On the Server side all these encoded strings are decoded back. It means encodeURIComponent is just for sending the address variable (whether it contains & character or not) to server side correctly.

Resources