I have a question regarding percent-encoding of email-addresses (RFC 5322) in mailto URIs (RFC 6068 / RFC 3986).
EMail-addresses may contain - among other otherwise forbidden characters - the "#" (at-character) within the local-part if it is in double-quotes. That is,
<"not#me"#example.org>
is a valid address. (Note that I use angle-brackets as delimiters -they are not part of the address.)
This is an example I found in RFC6068, and according to it the corresponding mailto URI is:
<mailto:%22not%40me%22#example.org>.
Looking at the syntax rules, however, it is not clear to me whether it is necessary to percent encode the "#", or if the following URI would be valid, too:
<mailto:%22not#me%22#example.org>.
That is: Which rule requires the "#" in the local-part to be escaped? Is it because I have percent-encoded the <"> double quotes, and now the rule "# has to be in double quotes if it appears in the local-part" is not applicable anymore?
Related
In a URI, spaces can be encoded as +. Since this is the case, should the leading plus be encoded when creating tel URIs with international prefix?
Which is better? Do both work in practice?
Call me
Call me
No.
From section 3 of RFC 3966 (The tel URI for Telephone Numbers):
If the reserved characters "+", ";", "=", and "?" are used as delimiters between components of the "tel" URI, they MUST NOT be percent encoded.
You would only percent-encode a + if it’s part of a parameter value:
These characters ["+", ";", "=", and "?"] MUST be percent encoded if they appear in tel URI parameter values.
I’m not sure if the leading +, which indicates that it’s a global number, counts as delimiter, but the definition of a global number says:
Globally unique numbers are identified by the leading "+" character.
So it refers to +, not to something percent-encoded.
And also the examples make clear that it’s not supposed to be percent-encoded, e.g.:
tel:+1-201-555-0123
Note that spaces in tel URIs (e.g., in parameter values) may not be encoded with a +. Using + instead of %20 for a space character is not something that may be done in any URI; it’s only possible in URIs whose URI scheme explicitly defines that.
The tel: URI scheme doesn't have a provision for encoding spaces - see RFC 3966:
5.1.1. Separators in Phone Numbers
...
even though ITU-T E.123 [E.123] recommends the use of space
characters as visual separators in printed telephone numbers, "tel"
URIs MUST NOT use spaces in visual separators to avoid excessive
escaping.
The plus sign encodes a space specifically only in application/x-www-form-urlencoded (default content type for form submission - see W3C info re: forms). There's no valid way to encode a space in tel: URIs. See again RFC 3966 (page 5) for valid visual separators.
I have a question regarding URLs:
I've read the RFC 3986 and still have a question about one URL:
If a URI contains an authority component, then the path component
must either be empty or begin with a slash ("/") character. If a URI
does not contain an authority component, then the path cannot begin
with two slash characters ("//"). In addition, a URI reference
(Section 4.1) may be a relative-path reference, in which case the
first path segment cannot contain a colon (":") character. The ABNF
requires five separate rules to disambiguate these cases, only one of
which will match the path substring within a given URI reference. We
use the generic term "path component" to describe the URI substring
matched by the parser to one of these rules.
I know, that //server.com:80/path/info is valid (it is a schema relative URL)
I also know that http://server.com:80/path//info is valid.
But I am not sure whether the following one is valid:
http://server.com:80//path/info
The problem behind my question is, that a cookie is not sent to http://server.com:80//path/info, when created by the URI http://server.com:80/path/info with restriction to /path
See url with multiple forward slashes, does it break anything?, Are there any downsides to using double-slashes in URLs?, What does the double slash mean in URLs? and RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax.
Consensus: browsers will do the request as-is, they will not alter the request. The / character is the path separator, but as path segments are defined as:
path-abempty = *( "/" segment )
segment = *pchar
Means the slash after http://example.com/ can directly be followed by another slash, ad infinitum. Servers might ignore it, but browsers don't, as you have figured out.
The phrase:
If a URI does not contain an authority component, then the path cannot begin
with two slash characters ("//").
Allows for protocol-relative URLs, but specifically states in that case no authority (server.com:80 in your example) may be present.
So: yes, it is valid, no, don't use it.
A URL like
http://localhost/path?a=b&c=d
is fine - but what is the status of the same URL with a trailing ampersand?
http://localhost/path?a=b&c=d&
For example the Java Servlet API allows it where Scala's Spray does not (ie it throws an error).
I've tried to find the answer in the URI syntax spec, but not sure how to parse their grammar.
The URI syntax spec is for generic URIs. It allows anything in the query. I am not aware of any specification which actually specifies ampersand-separated key=value pairs. I believe it is merely convention. I know that PHP, for example, offers an option for using a different separator. But now, everyone uses ampersand-separated things when they want key-value pairs. You still occasionally come across things which use it for just a simple string, e.g. http://example.com/?example. This is perfectly valid.
The basic answer, though, is that & is valid anywhere in the query string, including at the end.
Demistifying the RFC syntax, or Why & is valid anywhere in the query string:
First, you have
query = *( pchar / "/" / "?" )
(So a query string is made of any number of pchar and literal slashes and question marks.)
Going back, you have
pchar = unreserved / pct-encoded / sub-delims / ":" / "#"
And earlier still
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
So a literal & is in sub-delims which is in pchar, so it's valid in query
I think there is an unwritten rule that all RFCs must be near impossible to understand. You're not the first person unable to parse the grammar and - in my humble opinion - Spray also failed too.
There is nothing wrong with a trailing ampersand. It is a legal character in the URI used to separate parameters. A trailing ampersand may be pointless, but it isn't invalid. Spray should (again, only in my opinion) be simply ignoring it.
If you have a url like the following in a plain-text email:
Please follow this link https://server/doc?param=test. to reset the password of your account.
It becomes a useful feature since if the last parameter ends with ". " (dot + space) and is placed in an e-mail, all kinds of mail clients assume that the dot is ending a sentence and exclude it from the clickable link they add if they detect a URL.
Putting the link between double-quotes or angle-brackets isn't supported in all mail clients and default URL encoding leaves the dot as it is not a reserved URI character...
I ended up appending a trailing ampersand "&" to the URL to fix this, call me crazy but it works.
Due to a miscommunication with an affiliate partner we're working with the URL they call on our server has been mixed up.
This is the URL they are supposed to call on our server :
/AAAAAAAA/?b=CCCCCCC
unfotunately it was implemented in their system as this
?b=CCCCCCC/AAAAAAA
I can easily parse out the components, but I'm worried that a query string parameter with / in it is not actually a valid URL.
Is a / in a URL actually valid - or should I be concerned. Under what circumstances may an unencoded / cause problems in a query string.
According to RFC 3986: Uniform Resource Identifier (URI): Generic Syntax (from year 2005), yes, / is allowed in the query component. This is the BNF for the query string: (in Appendix A in RFC 3986)
query = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "#"
The spec says:
The characters slash ("/") and question mark ("?") may represent data within the query component.
as query components are often used to carry identifying information in the form of "key=value" pairs and one frequently used value is a reference to another URI, it is sometimes better for usability to avoid percent-encoding those characters
Here is a related question:
Query string: Can a query string contain a URL that also contains query strings?
Although I've never had a problem, they're not technically allowed as per RFC 2396:
Within a query component, the characters ";", "/", "?", ":", "#", "&", "=", "+", ",", and "$" are reserved.
But as I said...I've never run into any issues. I think it's a problem with older browsers more than anything, but maybe someone can shed some more light on a problem this causes?
Slash is a "reserved character" in the query part of a URL per RFC 2396 section 3.4, so according to section 2.2 it has to be encoded. That is, a query part can contain %2F but shouldn't contain /.
Are urls of the form http://asdf.com/something.do?param1=true?param2=false valid?
I don't think the second ? is allowed in valid urls and that it should instead be an ampersand (&), but I'm unable to find anything about this in the http 1.1 rfc. Any ideas?
It is not valid to use ? again. ? should indicate the start of the parameter list. & should separate parameters.
From RFC 3986:
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
From RFC 1738 :
An HTTP URL takes the form:
http:// <host> : <port> / <path> ? <searchpart>
where <host> and <port> are as described in Section 3.1. If :<port>
is omitted, the port defaults to 80. No user name or password is
allowed. <path> is an HTTP selector, and <searchpart> is a query
string. The <path> is optional, as is the <searchpart> and its
preceding "?". If neither <path> nor <searchpart> is present, the "/"
may also be omitted.
Within the <path> and <searchpart> components, "/", ";", "?" are
reserved. The "/" character may be used within HTTP to designate a
hierarchical structure.
The search part/query part is described here.
use & for the second and third
i.e. http://asdf.com/something.do?param1=true¶m2=false
application/x-www-form-urlencoded
This is the default content type. Forms submitted with this content type must be encoded as follows:
Control names and values are escaped. Space characters are replaced by +, and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by %HH, a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., %0D%0A).
The control names/values are listed in the order they appear in the document. The name is separated from the value by = and name/value pairs are separated from each other by &.
— application/x-www-form-urlencoded
As mentioned, it's not valid to use it again. However, if you have the ? character as part of a parameter value, you can encode it as %63 (just like the space character which gets encoded as %20).