URL just removes parameters - http

I've created an API for use on my website.
The API I made strips everything using mysql_real_escape_string then puts it into the database.
But the problem I'm having is the URL that my php scripts are using to access the API is cut short sometimes...
Which I have narrowed down to one of the parameters...
When its Ford Mondeo 22' the URL that is passed to simplexml_load_file is
http://mydomain.com/api/create.xml?api_number=brho15p6z1dhqwf5tsff&env=live&number=AJ20023232&title=Ford Mondeo 22'&image=http://mydomain.com/wp-content/uploads/2012/10/914955-150x150.jpg
but the API reports back the URL accessed as
http://mydomain.com/api/create.xml?api_number=brho15p6z1dhqwf5tsff&env=live&number=AJ20023232&title=Ford
If I remove the single quote then everything works fine, any idea how to correct this I suspect there's something I've overlooked when passing variables in the URL

It is the spaces in the "Ford Mondeo 22'" value that is causing the problem. You cannot have a spaces in the URL. You need to use escape characters. The encoded version of the parameter should be
Ford%20Mondeo%2022'
%20 is the escape character for space
I.e. the whole URL should read as follows:
http://mydomain.com/api/create.xml?api_number=brho15p6z1dhqwf5tsff&env=live&number=AJ20023232&title=Ford%20Mondeo%2022'&image=http://mydomain.com/wp-content/uploads/2012/10/914955-150x150.jpg
EDIT:
Your comment indicates that you use PHP. In PHP, you can use urlencode($foo) and urldecode($foo) to switch between the normal string and the encoding string.

Related

Navigating to decoded URL doesn't elicit the same action as navigating to the encoded URL

Trying to understand why pasting the first link works but not the second one.
Breakdown of the URL, for a clearer view:
Encoded version: [works]
http%3A%2F%2FsomeSite.com
%2FDownload.ashx
%3Frequest
%3DIL7zxW6ETqiYU6cThSNKL8MpY
%252bCRIVFZAVhd8DYPG85C1Uhdd
%252f2hqqmoObeNmuS3dg4bDgGBb0kUUxGZhej89kTaLBHBXS
%252bq3tlaEk2uMEcbWlUZzZQs00sirwZ2IvAvoSpU7HC3N1FaYSNciQ4iHNNmTU
%252f6uMypNlPOJ6enlbZ1OrrYODkaMRdRfGKEba
%252brusdryM4gp
%252bopi1a0gNuMQVCtj
%252bAvDcgXGOcZPNhPAnE
%253d&version=Ma88r6Z6t2JQcnVhVXgp0A%3D%3D
Decoded version: [doesn't work]
http://someSite.com
/Download.ashx
?request=
IL7zxW6ETqiYU6cThSNKL8MpY
+CRIVFZAVhd8DYPG85C1Uhdd
/2hqqmoObeNmuS3dg4bDgGBb0kUUxGZhej89kTaLBHBXS
+q3tlaEk2uMEcbWlUZzZQs00sirwZ2IvAvoSpU7HC3N1FaYSNciQ4iHNNmTU
/6uMypNlPOJ6enlbZ1OrrYODkaMRdRfGKEba
+rusdryM4gp
+opi1a0gNuMQVCtj
+AvDcgXGOcZPNhPAnE
=&version=Ma88r6Z6t2JQcnVhVXgp0A==
If I paste the first link in the browser - it works. A file download automatically starts.
If I paste the second link in the browser - page says Bad request.
Can anyone clarify it for me why the second one doesn't work?
Quoting the URLencodetag:
To “URL encode” or “percent encode” text means to encode it for use in a URL. Some characters are not valid when used as-is in URLs, and so much be URL-encoded (percent-encoded) when appearing in URLs.
The encoding was used for a reason, here because the base64 values for the request and version parameters contains +, / and = which have their own meaning in URLs and therefore need to be URL-encoded.

In ASP.NET, why is there UrlEncode() AND UrlPathEncode()?

In a recent project, I had the pleasure of troubleshooting a bug that involved images not loading when spaces were in the filename. I thought "What a simple issue, I'll UrlEncode() it!" But, NAY! Simply using UrlEncode() didn't resolve the problem.
The new problem was the HttpUtilities.UrlEncode() method switched spaces () to plusses (+) instead of %20 like the browser wanted. So file+image+name.jpg would return not-found while file%20image%20name.jpg was found correctly.
Thankfully, a coworker pointed out HttpUtilities.UrlPathEncode() to me which uses %20 for spaces instead of +.
WHY are there two ways of handling Url encoding? WHY are there two commands that behave so differently?
UrlEncode is useful for use with a QueryString as browsers tend to use a + here in place of a space when submitting forms with the GET method.
UrlPathEncode simply replaces all characters that cannot be used within a URL, such as <, > and .
Both MSDN links include this quote:
You can encode a URL using with the UrlEncode method or the
UrlPathEncode method. However, the methods return different results.
The UrlEncode method converts each space character to a plus character
(+). The UrlPathEncode method converts each space character into the
string "%20", which represents a space in hexadecimal notation. Use
the UrlPathEncode method when you encode the path portion of a URL in
order to guarantee a consistent decoded URL, regardless of which
platform or browser performs the decoding.
So in a URL you have the path and then a ? and then the parameters (i.e. http://some_path/page.aspx?parameters). URL paths encode spaces differently then the url parameters, that's why there is the two versions. For a long time spaces were not valid in a URL, but were in in the parameters.
In other words the formatting urls has changed over time. For a long time only ANSI chars could be in a URL too.

What are the risks of allowing quote characters as part of a URL parameter?

I need to allow the user to submit queries as follows;
/search/"my search string"
but it's failing because of request validation, as outlined in the following 2 questions:
How to include quote characters as a route parameter? Getting "Illegal characters in path" message
How to modify request validation?
I'm currently trying to figure out how to disable request validation for the quote character, but i'd like to know the risks before I actually put the site live with this disabled? I will not disable the request validation unless I can only disable it for the quote character, so I do intend to disallow every other character that's currently not allowed.
According to the URI generic syntax specification (RFC 2396), the double-quote character is explicitly excluded and must be escaped (i.e. %22). See section 2.4.3. The reason given in the spec:
The angle-bracket "<" and ">" and double-quote (") characters are excluded because they are often used as the delimiters around URI in text documents and protocol fields.
You can see easily why this is the case -- imagine trying to create a link in HTML to your URL:
<a href="http://somesite/search/"my search string""/>
That would fail HTML parsing (and also breaks SO's syntax highlighting). You also would have trouble doing basic things with the URL like emailing it to someone (the email client wouldn't parse the URL correctly), posting it on a message board, sending it in an instant message, etc.
For what it's worth, spaces are also explicitly excluded (same section of the RFC explains why).

when assigning location.href, please explain url encoding (in asp.net and firefox)

In some javascript, I have:
var url = "find.aspx?" + "location=" + encodeURIComponent( address );
alert( url );
location.href = url;
where the value of address is the string "Seattle, WA".
In the alert I see
find.aspx?Seattle%2C%20WA
as I expect.
But on the server side, when I look at Request.Url, the relevant substring I see is
find.aspx?Seattle, WA
And in the Firefox url window I see
find.aspx?location=Seattle%2C WA
So I'm getting three different representations whereas I would expect that in all three places I should see what I see in the alert. My expectation is that the url I assign to location.href should show up as-is in the browser url window, and should be passed as-is to the server in Request.Url (and I would need to decode the values on the server before using them). What's happening?
Firefox converts certain encoded characters into their literal forms as a way to be friendly to users. It will also convert spaces typed into the address bar into %20 for the server.
Update: The reason Firefox doesn't display the comma unencoded is because commas are allowed in URLs, but spaces are not, so it knows that a space is going to be unambiguously interpreted, whereas the pre-encoded comma is different from a non-encoded comma to some servers. see: Can I use commas in a URL?
ASP is probably trying to help you out by auto-un-encoding the string for you.
Update: It looks like ASP.NET unencodes Request.Url for you by default, as mentioned here: QueryString malformed after URLDecode They also mention that you can use HttpRequest.Url.Query to access the un-decoded version.
The alert is the only thing not doing any "magic" for you.
For the alert, you are doing the encoding yourself. Perhaps it looks the same as on the server-side if you removed encodeURIComponent.
On the server side, ASP.NET will always show you the unencoded form. This is to make it easier to directly map to files that also have text that needed to be (un)encoded.
Note that you can replace every letter for its UTF8 representation in URL Encoding. It will still be the same URL. I.e., type the following in the browser window and it will still work: %66%59%6E%64.aspx?location=Seattle%2C%20WA. To only encode the necessary chars, use UrlEncode on the server side if you create a link yourself.
URL encoding can become fairly tricky. You ask to explain it. To know the correct escape of a certain character, you need to know how that character looks in UTF8. The hexadecimal value of the UTF-8 bytes then become the %XX%YY value of your letter. Sometimes it's one %XX, but it can be up to six byte sequences in total (some Chinese characters for instance).
URL Encoding works one way only. Never double-encode or double-unencode. This is prohibited by the specification. Also, because you can encode any character, it is not always possible (as you found out) to do roundtrip encoding/unencoding. If you unencode and re-encode again, it is well possible that the resulting string is different, but syntactically the same.
In HTML, URL Encoding is sometimes interspersed with HTML Encoding. I.e., the ampersand is valid in HTML, but not in HTML. find.aspx?city=A&name=B becomes find.aspx?city=A&name=B in and HTML URL. However, browsers are lenient and will accept wrongly HTML-encoded strings.
Finally, a not on the browser: if you type in a space in a link, even inside an <a> tag, it will escape the space (or other character) for you. Likewise, it will nowadays show the odd characters (é, ï etc) in the address bar, but when it sends it over HTTP, the browser will correctly do the encoding for you.
Update: about anwering your question of needing a "definitive" reference or proof.
While I couldn't find any on the internet, I decided to look for it myself using Reflector. Going through the methods that set, for instance, the HttpRequest.QueryString, you quickly encounter the private method HttpRequest.FillInQueryStringCollection which then calls HttpValueCollection.FillfromEncodedBytes. Somewhat near the end of that method, HttpUtility.UrlDecode is called for the values. Conclusion: do not call it yourself, to prevent double decoding.
You can see this for yourself when you download Reflector and disassemble the .NET libs of System.Web.
For your example you can change this line
var url = "find.aspx?" + "location=" + encodeURIComponent( address );
to
var url = "find.aspx?" + "location=" + address;
and see the address as it is. Bu if address variable contains any '&' character your variable will be corrupt. So you are using encodeURIComponent to encode these things url.
On the Server side all these encoded strings are decoded back. It means encodeURIComponent is just for sending the address variable (whether it contains & character or not) to server side correctly.

Ampersands in URLRewriter Query Strings

I have a query string parameter value that contains an ampersand. For example, a valid value for the parameter may be:
a & b
When I generate the URL that contains the parameter, I'm using System.Web.HTTPUtility.UrlEncode() to make each element URL-friendly. It's (correctly) giving me a URL like:
http://example.com/foo?bar=a+%26b
The problem is that ASP.NET's Request object is interpreting the (encoded) ampersand as a Query String parameter delimiter, and is thus splitting my value into 2 parts (the first has "bar" as the parameter name; the second has a null name).
It appears that ASP.NET is URL-decoding the URL first and then using that when parsing the query string.
What's the best way to work around this?
UPDATE: The problem hinges on URLRewriter (a third-party plugin) and not ASP.NET itself. I've changed the title to reflect this, but I'll leave the rest of the question text as-is until I find out more about the problem.
man,
i am with you in the same boat, i have spent like hours and hours trying to figure out what is the problem, and as you said it is a bug in both, as normal links that contain weird characters or UTF-8 code characters are parsed fine by asp.net.
i think we have to switch to MVC.routing
Update: man you wont believe it, i have found the problem it is so strange, it is with IIS,
try to launch your page from visual studio Dev server and Unicode characters will be parsed just fine, but if you launch the page from IIS 7 it will give you the ???? characters.
hope some body will shade some light here
I would have thought that %26 and '&' mean exactly the same thing to the web server, so its the expected behavior. Urlencode is for encoding URLs, not encoding query strings.
... hang on ...
Try searching for abc&def in google, you'll get:
http://www.google.com.au/search?q=abc%26def
So your query string is correct, %26 is a literal ampersand. Hmm you're right, sounds like a bug. How do you go with an & instead of the %26 ?
Interesting reading:
http://www.stylusstudio.com/xsllist/200104/post11060.html
Switching to UrlRewritingNet.UrlRewrite did not help, as it apparently has the same bug. I'm thinking it might have something to do with ASP.NET after all.
I think URLRewriter has a problem with nameless parameters (null name).
I had a similar problem. When I gave my nameless parameter a (dummy) name, everything worked as expected.

Resources