Different query/parameter separators in URIs - uri

I wonder what the different parameter separators in (SIP) URIs indicate?
Some separated by ;, like: <sip:user#domain.com;foo=bar;x=y>.
Other are separated by ? and &, like: <sip:user#domain.com?foo=bar&x=y>

SIP separator rules comes from RFC 2396 that has been deprecated by RFC 3986. But with some usage specifications defined in section 19.1.1.
To summarize a bit, semicolon ";" is used to separate URI parameters, question mark "?" to signal query component's (stated as "Header fields" in section 19.1.1) starting point, and ampersand "&" is used to separate parameter pairs inside query string ("header fields").
Also worth checking Wikipedia entry: URI scheme
Hope this helps

Related

Is the "?" in URLs completely arbitrary (disregarding reserved/non-escaped character problems, etc.)?

For example, if for whatever stupid reason I configured my server to parse the URL by splitting the queries by the "^" symbol (escaped if necessary) and the "-" symbol instead of the "?" and "&", would I run into any trouble at all apart from a confused user?
Will the browser/HTTP request sent treat it differently in a way that may be detrimental to my up and coming "power minus" business?
? is not arbitrary but defined in the URI RFC section 3.4 Query, I dont' think you can change that.
The Query component internal syntax (how name=value couples are encoded) is not defined by the URI RFC, separators can be defined by other specifications:
& is defined as separator of the application/x-www-form-urlencoded content type by HTML Spec. You may change this aspect supporting for example ; as separator, but you would have in any case to support & for when processing the request produced by an HTML FORM.

Can HTTP headers contain colons in the field value?

I've been working with HTTP headers recently. I am parsing field and value from HTTP header requesrts based on the colon separated mandated by RFC. In python:
header_request_line.split(":")
However, this messes up if colons are allowed in the value fields. Consider:
User-Agent: Mozilla:4.0
which would be split into 3 strings, not 2 as I wanted.
Yes. So you can do something like this (pseudo):
header = "User-Agent: Mozilla:4.0"
headerParts = header.split(":")
key = headerParts[0]
value = headerParts.substring(key.length).trim()
// or
value = headerParts.skip(1).join(":")
But you'll probably run into various issues when parsing headers from various servers, so why not use a library?
Yes it can
In your example you might simply use split with maxsplit parameter specified like this:
header_request_line.split(":", 1)
It would produce the following result and would work despite the number of colons in the field value:
In [2]: 'User-Agent: Mozilla:4.0'.split(':', 1)
Out[2]: ['User-Agent', ' Mozilla:4.0']
Per RFC 7230, the answer is Yes.
The Header Value is a combination of {token, quoted-string, comment}, separated by delimiters. The delimiter may be a colon.
So a header like
User-Agent: Mozilla:4.0
has a value that consists of two tokens (Mozilla, 4.0) separated by a colon.
Nobody asked this specifically, but... in my opinion while colon is OK, and a quoted string is ok, it feels like poor style, to me, to use a JSON string as a header value.
My-Header: {"foo":"bar","prop2":12345}
..probably would work ok, but it doesn't comply with the intent of sec. 3.2.6 of RFC7230. Specifically { " , : are all delimiters... and some of them are consecutive in this JSON. A generic parser of HTTP header values that conforms to RFC7230 wouldn't be happy with that value. If your system needs that, then a better idea may be to URL-encode that value.
My-Header: %7B%22foo%22%3A%22bar%22%2C%22prop2%22%3A12345%7D
But that will probably be overkill in most cases. Probably you will be safe to insert JSON as a HTTP Header value.

Parameter separator in URLs, the case of misused question mark

What I don't really understand is the benefit of using '?' instead of '&' in urls:
It makes nobody's life easier if we use a different character as the first separator character.
Can you come up with a reasonable explanation?
EDIT: after more research I found that "&" can be a part of file name (terms&conditions.html) so "?" is a good separator. But still I think using "?" for separators makes lives easier (from url generators and parsers point of view):
Is there any advantage in using "&" which is not clear at the first glance?
From the URI spec's (RFC 3986) point of view, the only separator here is "?". the format of the query is opaque; the ampersands just are something that HTML happens to use for form submissions.
The answer's pretty much in this article - http://www.skorks.com/2010/05/what-every-developer-should-know-about-urls/ . To highlight it, here goes :
Query is the preferred way to send some parameters to a resource on
the server. These are key=value pairs and are separated from the rest
of the URL by a ? (question mark) character and are normally separated
from each other by & (ampersand) characters. What you may not know is
the fact that it is legal to separate them from each other by the ;
(semi-colon) character as well. The following URLs are equivalent:
http://www.blah.com/some/crazy/path.html?param1=foo&param2=bar
http://www.blah.com/some/crazy/path.html?param1=foo;param2
The RFC 3896 (https://www.ietf.org/rfc/rfc3986.txt) defines general and sub delimiters ... '?' is a general, '&' and ';' are sub. The spec is pretty clear about that.
In this case the latter '?' chars would be treated as part of the query. If the query parser follows the spec strictly, it would then pass the whole query on to the app-destination. If the app-destination could choose to further process the query string in a manner which treats the ? as a param name-value pairs delimiter, that is up to the app's designers.
My guess is that this often 'just works' because code that splits query strings and the original uri uses all delimiters for matching: 1) first query is split on '?' then 2) query string is parsed using char match list that includes '?' (convenience only).... This could be occurring in ubiquitous parsing libraries already.

Query string: Can a query string contain a URL that also contains query strings?

Example:
http://foo.com/generatepdf.aspx?u=http://foo.com/somepage.aspx?color=blue&size=15
I added the iis tag because I am guessing it also depends on what server technology you use?
The server technology shouldn't make a difference.
When you pass a value to a query string you need to url encode the name/value pair. If you want to pass in a value that contains a special character such as a question mark (?) you'll just need to encode that character as %3F. If you then needed to recursively pass another query string to the encoded url, you'll need to double/triple/etc encode the url resulting in the original ? turning into %253F, %25253F, etc.
you'll probably want to UrlEncode the url that is in the query string.
As reported in http://en.wikipedia.org/wiki/Query_string
W3C recommends that all web servers support semicolon separators in
addition to ampersand separators (link reported on that wiki page) to allow
application/x-www-form-urlencoded query strings in URLs within HTML
documents without having to entity escape ampersands.
So, I suppose the answer to the question is yes and you have to change in a ";" semicolon the "&" ampersand usaully used for key=value separator.
Yes it can, as far as I can tell, according to RFC 3986: Uniform Resource Identifier (URI): Generic Syntax (from year 2005):
This is the BNF for the query string:
query = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "#"
The spec says:
The characters slash ("/") and question mark ("?") may represent data within the query component.
as query components are often used to carry identifying information in the form of "key=value" pairs and one frequently used value is a reference to another URI, it is sometimes better for usability to avoid percent-encoding those characters
(But I suppose your server framework might or might not follow the specification exactly.)
No, but you can encode the url and decode it later.

Multiple params in http get request

Are urls of the form http://asdf.com/something.do?param1=true?param2=false valid?
I don't think the second ? is allowed in valid urls and that it should instead be an ampersand (&), but I'm unable to find anything about this in the http 1.1 rfc. Any ideas?
It is not valid to use ? again. ? should indicate the start of the parameter list. & should separate parameters.
From RFC 3986:
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
From RFC 1738 :
An HTTP URL takes the form:
http:// <host> : <port> / <path> ? <searchpart>
where <host> and <port> are as described in Section 3.1. If :<port>
is omitted, the port defaults to 80. No user name or password is
allowed. <path> is an HTTP selector, and <searchpart> is a query
string. The <path> is optional, as is the <searchpart> and its
preceding "?". If neither <path> nor <searchpart> is present, the "/"
may also be omitted.
Within the <path> and <searchpart> components, "/", ";", "?" are
reserved. The "/" character may be used within HTTP to designate a
hierarchical structure.
The search part/query part is described here.
use & for the second and third
i.e. http://asdf.com/something.do?param1=true&param2=false
application/x-www-form-urlencoded
This is the default content type. Forms submitted with this content type must be encoded as follows:
Control names and values are escaped. Space characters are replaced by +, and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by %HH, a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., %0D%0A).
The control names/values are listed in the order they appear in the document. The name is separated from the value by = and name/value pairs are separated from each other by &.
— application/x-www-form-urlencoded
As mentioned, it's not valid to use it again. However, if you have the ? character as part of a parameter value, you can encode it as %63 (just like the space character which gets encoded as %20).

Resources