is a query string with a / in it valid? - http

Due to a miscommunication with an affiliate partner we're working with the URL they call on our server has been mixed up.
This is the URL they are supposed to call on our server :
/AAAAAAAA/?b=CCCCCCC
unfotunately it was implemented in their system as this
?b=CCCCCCC/AAAAAAA
I can easily parse out the components, but I'm worried that a query string parameter with / in it is not actually a valid URL.
Is a / in a URL actually valid - or should I be concerned. Under what circumstances may an unencoded / cause problems in a query string.

According to RFC 3986: Uniform Resource Identifier (URI): Generic Syntax (from year 2005), yes, / is allowed in the query component. This is the BNF for the query string: (in Appendix A in RFC 3986)
query = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "#"
The spec says:
The characters slash ("/") and question mark ("?") may represent data within the query component.
as query components are often used to carry identifying information in the form of "key=value" pairs and one frequently used value is a reference to another URI, it is sometimes better for usability to avoid percent-encoding those characters
Here is a related question:
Query string: Can a query string contain a URL that also contains query strings?

Although I've never had a problem, they're not technically allowed as per RFC 2396:
Within a query component, the characters ";", "/", "?", ":", "#", "&", "=", "+", ",", and "$" are reserved.
But as I said...I've never run into any issues. I think it's a problem with older browsers more than anything, but maybe someone can shed some more light on a problem this causes?

Slash is a "reserved character" in the query part of a URL per RFC 2396 section 3.4, so according to section 2.2 it has to be encoded. That is, a query part can contain %2F but shouldn't contain /.

Related

Is the "?" in URLs completely arbitrary (disregarding reserved/non-escaped character problems, etc.)?

For example, if for whatever stupid reason I configured my server to parse the URL by splitting the queries by the "^" symbol (escaped if necessary) and the "-" symbol instead of the "?" and "&", would I run into any trouble at all apart from a confused user?
Will the browser/HTTP request sent treat it differently in a way that may be detrimental to my up and coming "power minus" business?
? is not arbitrary but defined in the URI RFC section 3.4 Query, I dont' think you can change that.
The Query component internal syntax (how name=value couples are encoded) is not defined by the URI RFC, separators can be defined by other specifications:
& is defined as separator of the application/x-www-form-urlencoded content type by HTML Spec. You may change this aspect supporting for example ; as separator, but you would have in any case to support & for when processing the request produced by an HTML FORM.

Should the plus in tel URIs be encoded?

In a URI, spaces can be encoded as +. Since this is the case, should the leading plus be encoded when creating tel URIs with international prefix?
Which is better? Do both work in practice?
Call me
Call me
No.
From section 3 of RFC 3966 (The tel URI for Telephone Numbers):
If the reserved characters "+", ";", "=", and "?" are used as delimiters between components of the "tel" URI, they MUST NOT be percent encoded.
You would only percent-encode a + if it’s part of a parameter value:
These characters ["+", ";", "=", and "?"] MUST be percent encoded if they appear in tel URI parameter values.
I’m not sure if the leading +, which indicates that it’s a global number, counts as delimiter, but the definition of a global number says:
Globally unique numbers are identified by the leading "+" character.
So it refers to +, not to something percent-encoded.
And also the examples make clear that it’s not supposed to be percent-encoded, e.g.:
tel:+1-201-555-0123
Note that spaces in tel URIs (e.g., in parameter values) may not be encoded with a +. Using + instead of %20 for a space character is not something that may be done in any URI; it’s only possible in URIs whose URI scheme explicitly defines that.
The tel: URI scheme doesn't have a provision for encoding spaces - see RFC 3966:
5.1.1. Separators in Phone Numbers
...
even though ITU-T E.123 [E.123] recommends the use of space
characters as visual separators in printed telephone numbers, "tel"
URIs MUST NOT use spaces in visual separators to avoid excessive
escaping.
The plus sign encodes a space specifically only in application/x-www-form-urlencoded (default content type for form submission - see W3C info re: forms). There's no valid way to encode a space in tel: URIs. See again RFC 3966 (page 5) for valid visual separators.

Parameter separator in URLs, the case of misused question mark

What I don't really understand is the benefit of using '?' instead of '&' in urls:
It makes nobody's life easier if we use a different character as the first separator character.
Can you come up with a reasonable explanation?
EDIT: after more research I found that "&" can be a part of file name (terms&conditions.html) so "?" is a good separator. But still I think using "?" for separators makes lives easier (from url generators and parsers point of view):
Is there any advantage in using "&" which is not clear at the first glance?
From the URI spec's (RFC 3986) point of view, the only separator here is "?". the format of the query is opaque; the ampersands just are something that HTML happens to use for form submissions.
The answer's pretty much in this article - http://www.skorks.com/2010/05/what-every-developer-should-know-about-urls/ . To highlight it, here goes :
Query is the preferred way to send some parameters to a resource on
the server. These are key=value pairs and are separated from the rest
of the URL by a ? (question mark) character and are normally separated
from each other by & (ampersand) characters. What you may not know is
the fact that it is legal to separate them from each other by the ;
(semi-colon) character as well. The following URLs are equivalent:
http://www.blah.com/some/crazy/path.html?param1=foo&param2=bar
http://www.blah.com/some/crazy/path.html?param1=foo;param2
The RFC 3896 (https://www.ietf.org/rfc/rfc3986.txt) defines general and sub delimiters ... '?' is a general, '&' and ';' are sub. The spec is pretty clear about that.
In this case the latter '?' chars would be treated as part of the query. If the query parser follows the spec strictly, it would then pass the whole query on to the app-destination. If the app-destination could choose to further process the query string in a manner which treats the ? as a param name-value pairs delimiter, that is up to the app's designers.
My guess is that this often 'just works' because code that splits query strings and the original uri uses all delimiters for matching: 1) first query is split on '?' then 2) query string is parsed using char match list that includes '?' (convenience only).... This could be occurring in ubiquitous parsing libraries already.

Is a trailing ampersand legal in a URL?

A URL like
http://localhost/path?a=b&c=d
is fine - but what is the status of the same URL with a trailing ampersand?
http://localhost/path?a=b&c=d&
For example the Java Servlet API allows it where Scala's Spray does not (ie it throws an error).
I've tried to find the answer in the URI syntax spec, but not sure how to parse their grammar.
The URI syntax spec is for generic URIs. It allows anything in the query. I am not aware of any specification which actually specifies ampersand-separated key=value pairs. I believe it is merely convention. I know that PHP, for example, offers an option for using a different separator. But now, everyone uses ampersand-separated things when they want key-value pairs. You still occasionally come across things which use it for just a simple string, e.g. http://example.com/?example. This is perfectly valid.
The basic answer, though, is that & is valid anywhere in the query string, including at the end.
Demistifying the RFC syntax, or Why & is valid anywhere in the query string:
First, you have
query = *( pchar / "/" / "?" )
(So a query string is made of any number of pchar and literal slashes and question marks.)
Going back, you have
pchar = unreserved / pct-encoded / sub-delims / ":" / "#"
And earlier still
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
So a literal & is in sub-delims which is in pchar, so it's valid in query
I think there is an unwritten rule that all RFCs must be near impossible to understand. You're not the first person unable to parse the grammar and - in my humble opinion - Spray also failed too.
There is nothing wrong with a trailing ampersand. It is a legal character in the URI used to separate parameters. A trailing ampersand may be pointless, but it isn't invalid. Spray should (again, only in my opinion) be simply ignoring it.
If you have a url like the following in a plain-text email:
Please follow this link https://server/doc?param=test. to reset the password of your account.
It becomes a useful feature since if the last parameter ends with ". " (dot + space) and is placed in an e-mail, all kinds of mail clients assume that the dot is ending a sentence and exclude it from the clickable link they add if they detect a URL.
Putting the link between double-quotes or angle-brackets isn't supported in all mail clients and default URL encoding leaves the dot as it is not a reserved URI character...
I ended up appending a trailing ampersand "&" to the URL to fix this, call me crazy but it works.

Query string: Can a query string contain a URL that also contains query strings?

Example:
http://foo.com/generatepdf.aspx?u=http://foo.com/somepage.aspx?color=blue&size=15
I added the iis tag because I am guessing it also depends on what server technology you use?
The server technology shouldn't make a difference.
When you pass a value to a query string you need to url encode the name/value pair. If you want to pass in a value that contains a special character such as a question mark (?) you'll just need to encode that character as %3F. If you then needed to recursively pass another query string to the encoded url, you'll need to double/triple/etc encode the url resulting in the original ? turning into %253F, %25253F, etc.
you'll probably want to UrlEncode the url that is in the query string.
As reported in http://en.wikipedia.org/wiki/Query_string
W3C recommends that all web servers support semicolon separators in
addition to ampersand separators (link reported on that wiki page) to allow
application/x-www-form-urlencoded query strings in URLs within HTML
documents without having to entity escape ampersands.
So, I suppose the answer to the question is yes and you have to change in a ";" semicolon the "&" ampersand usaully used for key=value separator.
Yes it can, as far as I can tell, according to RFC 3986: Uniform Resource Identifier (URI): Generic Syntax (from year 2005):
This is the BNF for the query string:
query = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "#"
The spec says:
The characters slash ("/") and question mark ("?") may represent data within the query component.
as query components are often used to carry identifying information in the form of "key=value" pairs and one frequently used value is a reference to another URI, it is sometimes better for usability to avoid percent-encoding those characters
(But I suppose your server framework might or might not follow the specification exactly.)
No, but you can encode the url and decode it later.

Resources