Query string: Can a query string contain a URL that also contains query strings? - asp.net

Example:
http://foo.com/generatepdf.aspx?u=http://foo.com/somepage.aspx?color=blue&size=15
I added the iis tag because I am guessing it also depends on what server technology you use?

The server technology shouldn't make a difference.
When you pass a value to a query string you need to url encode the name/value pair. If you want to pass in a value that contains a special character such as a question mark (?) you'll just need to encode that character as %3F. If you then needed to recursively pass another query string to the encoded url, you'll need to double/triple/etc encode the url resulting in the original ? turning into %253F, %25253F, etc.

you'll probably want to UrlEncode the url that is in the query string.

As reported in http://en.wikipedia.org/wiki/Query_string
W3C recommends that all web servers support semicolon separators in
addition to ampersand separators (link reported on that wiki page) to allow
application/x-www-form-urlencoded query strings in URLs within HTML
documents without having to entity escape ampersands.
So, I suppose the answer to the question is yes and you have to change in a ";" semicolon the "&" ampersand usaully used for key=value separator.

Yes it can, as far as I can tell, according to RFC 3986: Uniform Resource Identifier (URI): Generic Syntax (from year 2005):
This is the BNF for the query string:
query = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "#"
The spec says:
The characters slash ("/") and question mark ("?") may represent data within the query component.
as query components are often used to carry identifying information in the form of "key=value" pairs and one frequently used value is a reference to another URI, it is sometimes better for usability to avoid percent-encoding those characters
(But I suppose your server framework might or might not follow the specification exactly.)

No, but you can encode the url and decode it later.

Related

Is the "?" in URLs completely arbitrary (disregarding reserved/non-escaped character problems, etc.)?

For example, if for whatever stupid reason I configured my server to parse the URL by splitting the queries by the "^" symbol (escaped if necessary) and the "-" symbol instead of the "?" and "&", would I run into any trouble at all apart from a confused user?
Will the browser/HTTP request sent treat it differently in a way that may be detrimental to my up and coming "power minus" business?
? is not arbitrary but defined in the URI RFC section 3.4 Query, I dont' think you can change that.
The Query component internal syntax (how name=value couples are encoded) is not defined by the URI RFC, separators can be defined by other specifications:
& is defined as separator of the application/x-www-form-urlencoded content type by HTML Spec. You may change this aspect supporting for example ; as separator, but you would have in any case to support & for when processing the request produced by an HTML FORM.

How to determine if a URI is escaped?

I am using apache commons HTTPClient to download web resources. The URI for these resources come from third parties, I do not generate them.
The commons httpclient requires a URI object to be given to the GetMethod object.
The URI constructor takes a string (for the uri) and a boolean specifying if it is escaped or not.
Currently, I am doing the following to determine if the original url I am given is already escaped...
boolean isEscaped = URIUtil.getPathQuery(originalUrl).contains("%");
m.setURI(new URI(originalUrl, isEscaped));
Is this the correct way to determine if a uri is already escaped?
Update...
according to wikipedia ( Well, according to wikipedia ( http://en.wikipedia.org/wiki/Percent-encoding ) it says that percent is a reserved character and should always be encoded... I am quoting verbatim here...
Percent-encoding the percent character[edit] Because the percent ("%")
character serves as the indicator for percent-encoded octets, it must
be percent-encoded as "%25" for that octet to be used as data within a
URI.
Doesnt this mean that you can never have a naked '%' character in a valid uri?
Also, the uri(s) come from various sources so I cannot be sure if they are escaped or unescaped.
This wouldn't work. It's possible the un-encoded string has a % in it already.
ex:
https://www.google.com/#q=like%25&safe=off
is the url for a google search for like%. In unescaped form it would be https://www.google.com/#q=like%&safe=off
Your consumers should let you know if the URI is escaped or not.

Parameter separator in URLs, the case of misused question mark

What I don't really understand is the benefit of using '?' instead of '&' in urls:
It makes nobody's life easier if we use a different character as the first separator character.
Can you come up with a reasonable explanation?
EDIT: after more research I found that "&" can be a part of file name (terms&conditions.html) so "?" is a good separator. But still I think using "?" for separators makes lives easier (from url generators and parsers point of view):
Is there any advantage in using "&" which is not clear at the first glance?
From the URI spec's (RFC 3986) point of view, the only separator here is "?". the format of the query is opaque; the ampersands just are something that HTML happens to use for form submissions.
The answer's pretty much in this article - http://www.skorks.com/2010/05/what-every-developer-should-know-about-urls/ . To highlight it, here goes :
Query is the preferred way to send some parameters to a resource on
the server. These are key=value pairs and are separated from the rest
of the URL by a ? (question mark) character and are normally separated
from each other by & (ampersand) characters. What you may not know is
the fact that it is legal to separate them from each other by the ;
(semi-colon) character as well. The following URLs are equivalent:
http://www.blah.com/some/crazy/path.html?param1=foo&param2=bar
http://www.blah.com/some/crazy/path.html?param1=foo;param2
The RFC 3896 (https://www.ietf.org/rfc/rfc3986.txt) defines general and sub delimiters ... '?' is a general, '&' and ';' are sub. The spec is pretty clear about that.
In this case the latter '?' chars would be treated as part of the query. If the query parser follows the spec strictly, it would then pass the whole query on to the app-destination. If the app-destination could choose to further process the query string in a manner which treats the ? as a param name-value pairs delimiter, that is up to the app's designers.
My guess is that this often 'just works' because code that splits query strings and the original uri uses all delimiters for matching: 1) first query is split on '?' then 2) query string is parsed using char match list that includes '?' (convenience only).... This could be occurring in ubiquitous parsing libraries already.

Is a trailing ampersand legal in a URL?

A URL like
http://localhost/path?a=b&c=d
is fine - but what is the status of the same URL with a trailing ampersand?
http://localhost/path?a=b&c=d&
For example the Java Servlet API allows it where Scala's Spray does not (ie it throws an error).
I've tried to find the answer in the URI syntax spec, but not sure how to parse their grammar.
The URI syntax spec is for generic URIs. It allows anything in the query. I am not aware of any specification which actually specifies ampersand-separated key=value pairs. I believe it is merely convention. I know that PHP, for example, offers an option for using a different separator. But now, everyone uses ampersand-separated things when they want key-value pairs. You still occasionally come across things which use it for just a simple string, e.g. http://example.com/?example. This is perfectly valid.
The basic answer, though, is that & is valid anywhere in the query string, including at the end.
Demistifying the RFC syntax, or Why & is valid anywhere in the query string:
First, you have
query = *( pchar / "/" / "?" )
(So a query string is made of any number of pchar and literal slashes and question marks.)
Going back, you have
pchar = unreserved / pct-encoded / sub-delims / ":" / "#"
And earlier still
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
So a literal & is in sub-delims which is in pchar, so it's valid in query
I think there is an unwritten rule that all RFCs must be near impossible to understand. You're not the first person unable to parse the grammar and - in my humble opinion - Spray also failed too.
There is nothing wrong with a trailing ampersand. It is a legal character in the URI used to separate parameters. A trailing ampersand may be pointless, but it isn't invalid. Spray should (again, only in my opinion) be simply ignoring it.
If you have a url like the following in a plain-text email:
Please follow this link https://server/doc?param=test. to reset the password of your account.
It becomes a useful feature since if the last parameter ends with ". " (dot + space) and is placed in an e-mail, all kinds of mail clients assume that the dot is ending a sentence and exclude it from the clickable link they add if they detect a URL.
Putting the link between double-quotes or angle-brackets isn't supported in all mail clients and default URL encoding leaves the dot as it is not a reserved URI character...
I ended up appending a trailing ampersand "&" to the URL to fix this, call me crazy but it works.

is a query string with a / in it valid?

Due to a miscommunication with an affiliate partner we're working with the URL they call on our server has been mixed up.
This is the URL they are supposed to call on our server :
/AAAAAAAA/?b=CCCCCCC
unfotunately it was implemented in their system as this
?b=CCCCCCC/AAAAAAA
I can easily parse out the components, but I'm worried that a query string parameter with / in it is not actually a valid URL.
Is a / in a URL actually valid - or should I be concerned. Under what circumstances may an unencoded / cause problems in a query string.
According to RFC 3986: Uniform Resource Identifier (URI): Generic Syntax (from year 2005), yes, / is allowed in the query component. This is the BNF for the query string: (in Appendix A in RFC 3986)
query = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "#"
The spec says:
The characters slash ("/") and question mark ("?") may represent data within the query component.
as query components are often used to carry identifying information in the form of "key=value" pairs and one frequently used value is a reference to another URI, it is sometimes better for usability to avoid percent-encoding those characters
Here is a related question:
Query string: Can a query string contain a URL that also contains query strings?
Although I've never had a problem, they're not technically allowed as per RFC 2396:
Within a query component, the characters ";", "/", "?", ":", "#", "&", "=", "+", ",", and "$" are reserved.
But as I said...I've never run into any issues. I think it's a problem with older browsers more than anything, but maybe someone can shed some more light on a problem this causes?
Slash is a "reserved character" in the query part of a URL per RFC 2396 section 3.4, so according to section 2.2 it has to be encoded. That is, a query part can contain %2F but shouldn't contain /.

Resources