I know absolute path-only URLs (/path/to/resource) are valid, and refer to the same scheme, host, port, etc. as the current resource. Is the URL still valid if the same (or a different!) scheme is added? (http:/path/to/resource or https:/path/to/resource)
If it is valid according to the letter of the spec, how well do browsers handle it? How well do developers that may come across the code in the future handle it?
Addendum:
Here's a simple test case I set up on an Apache server:
resource/number/one/index.html:
link
resource/number/two/index.html:
two
Testing in Chrome 43 on OS X: The URL displayed when hovering over the link looks correct. Clicking the link works as expected. Looking at the DOM in the web inspector, hovering over the a href URL displays an incorrect location (/resource/number/one/http:/resource/number/two/).
Firefox 38 appears to also handle the click correctly. Weird.
No, it’s not valid. From RFC 3986:
4.2. Relative Reference
A relative reference takes advantage of the hierarchical syntax
(Section 1.2.3) to express a URI reference relative to the name space
of another hierarchical URI.
relative-ref = relative-part [ "?" query ] [ "#" fragment ]
relative-part = "//" authority path-abempty
/ path-absolute
/ path-noscheme
/ path-empty
The URI referred to by a relative reference, also known as the target
URI, is obtained by applying the reference resolution algorithm of
Section 5.
A relative reference that begins with two slash characters is termed
a network-path reference; such references are rarely used. A
relative reference that begins with a single slash character is
termed an absolute-path reference. A relative reference that does
not begin with a slash character is termed a relative-path reference.
A path segment that contains a colon character (e.g., "this:that")
cannot be used as the first segment of a relative-path reference, as
it would be mistaken for a scheme name. Such a segment must be
preceded by a dot-segment (e.g., "./this:that") to make a relative-
path reference.
where path-noscheme is specifically a path that doesn’t start with / whose first segment does not contain a colon, which addresses your question pretty specifically.
Related
When I was reading RFC6690 on the rules for determining context URI in link format
I failed to understand what it means by "link format resources's base uri"
2.1. Target and Context URIs
Each link conveys one target URI as a URI-reference inside angle
brackets ("<>"). The context URI of a link (also called the base URI
in [RFC3986]) is determined by the following rules in this
specification:
(a) The context URI is set to the anchor parameter, when specified.
My understanding: simply check for "anchor" attribute
(b) Origin of the target URI, when specified.
My understanding: If target URI is an absolute uri (contains origin) then use its origin (without path and query) as context
(c) Origin of the link format resource's base URI.
My understanding: I'm lost, where should I look for this base uri?
I understand that an origin is defined as a combination of URI scheme, host name, and port number
I also understand Base URI is an absolute URI, where relative URI can resolve against.
But I failed to understand what does "base uri" mean in the context of RFC6690 Section 2.1
If resource target uri is not absolute uri and don't have an origin, then how can I find the origin of link format resource's base uri?
RFC6690 mixes concepts here ("The context URI of a link (also called the base URI[...])" -- context and base URI are distinct concepts). What they mean by base URI here is the URI pointed to by the anchor attribute (which, when missing, defaults to the URI the document was requested from).
The prevalent interpretation seems to be that the link's context is the anchor attribute resolved against the origin of the requested URI (when no anchor is there, it is the origin of the requested URI), and the link's target is the part between the angular brackets resolved against the origin of the context. This is not exactly what is written there, but at least it works with the examples given in the same document.
The rules as set out there are so confusing (and, worst of all, different from the very similar ones in the Link header) that even if you follow them to the letter, you can not expect interoperability: Of all the implementations I surveyed for the CoRE mailing list, none considered the anchor properly in the resolution. I suggest you stick with Limited Link Format (defined in the Resource Directory Draft), which is compatible with the resolution steps both of Link headers and RFC6690, and accompanied by a walk-through.
(I do have high hopes for all link-format to be replaced by CoRAL on the long run, but that has not progressed far enough that I would recommend it for implementation anywhere near a production environment.)
I've been reading about url's. Absolute, scheme relative, root relative, location relative.
I still don't understand difference between these two:
//domain.com/index.html - scheme relative
domain.com/index.html - ?
.
Question 1:
Correct me if I am wrong //domain.com/index.html will resolve to absolute url like this:
http://domain.com/index.html
https://domain.com/index.html
ftp://domain.com/index.html
file://domain.com/index.html -- if in email
And browsers will act differently: ie6 doesn't support, ie7,8 will fetch data twice(http https).
.
Question 2:
How will domain.com/index.html resolve? Same as scheme relative url in Q1? Or is it something else?
.
Question 3:
Is there any difference between these url's, what is it and why?
//www.domain.com/index.html
www.domain.com/index.html
.
Question 4:
How will //www.domain.com/index.html resolve?
.
Question 5:
How will www.domain.com/index.html resolve?
It's very easy, looking at URLs like these, to apply your human knowledge of what they probably mean, rather than the much simpler rules implemented by software like web browsers.
The simplest type of URL (or more accurately URI, since some schemes don't represent a Location, only an Identifier) is absolute; it starts with a scheme, then a colon, and no context is needed to resolve it. Examples:
http://example.com
https://www.example.com/foo/bar.baz
http://127.0.0.1:8001
mailto:someone#example.com
data:text/plain,test
urn:example
Then there are location-relative URLs; that is, anything without a scheme, and without a leading slash. These replace everything after the slash in the current context, but leave the rest in place. If the current context is http://example.com/foo/bar.baz, you could have relative URLs like so:
bob.baz -> http://example.com/foo/bob.baz
thing/widget.gizmo -> http://example.com/foo/thing/widget.gizmo
example.com/page -> http://example.com/foo/example.com/page
Note that that last example looks like a domain name at first glance, but is actually exactly the same as all the other relative URLs.
Root-relative URLs, with a leading slash, are similar, but instead of deleting after the last slash, they delete after the first. Given the same context, the previous examples become:
/bob.baz -> http://example.com/bob.baz
/thing/widget.gizmo -> http://example.com/thing/widget.gizmo
/example.com/page -> http://example.com/example.com/page
A root-relative URL could also contain a colon, because the leading slash cannot be part of a scheme prefix:
/foo:bar -> http://example.com/foo:bar
/urn:example -> http://example.com/urn:example
Finally, there are scheme-relative URLs, with two leading slashes. They replace everything after the original double-slash, so keep only the scheme:
if the context is http://example.com/foo/bar then //example.org/bob means http://example.org/bob
if the context is https://example.com/foo/bar then //example.org/bob means https://example.org/bob
if the context is http://example.com, then //foo.bar means http://foo.bar
Note that that last example doesn't look like a domain name to us, but it still follows the same rules. Whether a URL is actually useful is not taken into account when parsing any of the relative forms.
Conventions like "begins with www." and "ends with .com" cannot be relied on, and are not used to determine if a URL is relative or not, so all you need do to answer all your questions is follow this simple set of rules:
If there are two leading slashes, it is scheme relative
If there is one leading slash, it is root relative
If there is no leading slash, but there is a colon, assume it is an absolute URI
If there is no leading slash, and no colon, it is location relative
They are very different. The second one is a relative reference to a path "domain.com/index.html".
WRT "domain.com" vs "www.domain.com": these are simply different host names (or path names in the second variant)
I have a question regarding URLs:
I've read the RFC 3986 and still have a question about one URL:
If a URI contains an authority component, then the path component
must either be empty or begin with a slash ("/") character. If a URI
does not contain an authority component, then the path cannot begin
with two slash characters ("//"). In addition, a URI reference
(Section 4.1) may be a relative-path reference, in which case the
first path segment cannot contain a colon (":") character. The ABNF
requires five separate rules to disambiguate these cases, only one of
which will match the path substring within a given URI reference. We
use the generic term "path component" to describe the URI substring
matched by the parser to one of these rules.
I know, that //server.com:80/path/info is valid (it is a schema relative URL)
I also know that http://server.com:80/path//info is valid.
But I am not sure whether the following one is valid:
http://server.com:80//path/info
The problem behind my question is, that a cookie is not sent to http://server.com:80//path/info, when created by the URI http://server.com:80/path/info with restriction to /path
See url with multiple forward slashes, does it break anything?, Are there any downsides to using double-slashes in URLs?, What does the double slash mean in URLs? and RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax.
Consensus: browsers will do the request as-is, they will not alter the request. The / character is the path separator, but as path segments are defined as:
path-abempty = *( "/" segment )
segment = *pchar
Means the slash after http://example.com/ can directly be followed by another slash, ad infinitum. Servers might ignore it, but browsers don't, as you have figured out.
The phrase:
If a URI does not contain an authority component, then the path cannot begin
with two slash characters ("//").
Allows for protocol-relative URLs, but specifically states in that case no authority (server.com:80 in your example) may be present.
So: yes, it is valid, no, don't use it.
According to RFC 3986 Section 3 - Syntax Components:
The scheme and path components are required, though the path may be
empty (no characters).
Can someone clarify how the path component can be required if it's able to be empty? Maybe I'm misunderstanding the definition of "required" in this context, but I assumed it to mean something along the lines of "must be non-empty," which obviously conflicts with the spec here.
Here, "required" means merely "always present": the scheme and path
components of an absolute URI are always present.
The scheme component can't be empty because the production
"scheme" requires at least one character.
The path component can be empty because the production
"path-empty" (part of "hier-part") consists of zero characters.
A common practical example of an empty - more precisely, an abempty - path is a URI like http://stackoverflow.com where the path is empty. The authority component (in this case it is stackoverflow.com) alone isn't enough information to identify a resource.
When the authority is empty, the path must begin with a / in order to distinguish the path from the authority - scheme:/// is a valid URI - hence an abempty path. Also take a look at this answer for further reading.
I just learned from a colleague that omitting the "http | https" part of a URL in a link will make that URL use whatever scheme the page it's on uses.
So for example, if my page is accessed at http://www.example.com and I have a link (notice the '//' at the front):
Google
That link will go to http://www.google.com.
But if I access the page at https://www.example.com with the same link, it will go to https://www.google.com
I wanted to look online for more information about this, but I'm having trouble thinking of a good search phrase. If I search for "URLs without HTTP" the pages returned are about urls with this form: "www.example.com", which is not what I'm looking for.
Would you call that a schemeless URL? A protocol-less URL?
Does this work in all browsers? I tested it in FF and IE 8 and it worked in both. Is this part of a standard, or should I test more browsers?
Protocol relative URL
You may receive unusual security warnings in some browsers.
See also, Wikipedia Protocol-relative URLs for a brief definition.
At one time, it was recommended; but going forward, it should be avoided.
See also the Stack Overflow question Why use protocol-relative URLs at all?.
It is called network-path reference (the part that is missing is called scheme or protocol) defined in RFC3986 Section 4.2
4.2 Relative Reference
A relative reference takes advantage of the hierarchical syntax
(Section 1.2.3) to express a URI reference relative to the name space
of another hierarchical URI.
relative-ref = relative-part [ "?" query ] [ "#" fragment ]
relative-part = "//" authority path-abempty
/ path-absolute
/ path-noscheme
/ path-empty
The URI referred to by a relative reference, also known as the target URI, is obtained by applying the reference resolution
algorithm of Section 5.
A relative reference that begins with two slash characters is
termed a network-path reference (emphasis mine); such references are rarely used.
A relative reference that begins with a single slash character is termed an absolute-path reference. A relative reference that does not begin with a slash character is termed a relative-path reference.
A path segment that contains a colon character (e.g., "this:that") cannot be used as the first segment of a relative-path reference, as it would be mistaken for a scheme name. Such a segment must be preceded by a dot-segment (e.g., "./this:that") to make a relative- path reference.