When I was reading RFC6690 on the rules for determining context URI in link format
I failed to understand what it means by "link format resources's base uri"
2.1. Target and Context URIs
Each link conveys one target URI as a URI-reference inside angle
brackets ("<>"). The context URI of a link (also called the base URI
in [RFC3986]) is determined by the following rules in this
specification:
(a) The context URI is set to the anchor parameter, when specified.
My understanding: simply check for "anchor" attribute
(b) Origin of the target URI, when specified.
My understanding: If target URI is an absolute uri (contains origin) then use its origin (without path and query) as context
(c) Origin of the link format resource's base URI.
My understanding: I'm lost, where should I look for this base uri?
I understand that an origin is defined as a combination of URI scheme, host name, and port number
I also understand Base URI is an absolute URI, where relative URI can resolve against.
But I failed to understand what does "base uri" mean in the context of RFC6690 Section 2.1
If resource target uri is not absolute uri and don't have an origin, then how can I find the origin of link format resource's base uri?
RFC6690 mixes concepts here ("The context URI of a link (also called the base URI[...])" -- context and base URI are distinct concepts). What they mean by base URI here is the URI pointed to by the anchor attribute (which, when missing, defaults to the URI the document was requested from).
The prevalent interpretation seems to be that the link's context is the anchor attribute resolved against the origin of the requested URI (when no anchor is there, it is the origin of the requested URI), and the link's target is the part between the angular brackets resolved against the origin of the context. This is not exactly what is written there, but at least it works with the examples given in the same document.
The rules as set out there are so confusing (and, worst of all, different from the very similar ones in the Link header) that even if you follow them to the letter, you can not expect interoperability: Of all the implementations I surveyed for the CoRE mailing list, none considered the anchor properly in the resolution. I suggest you stick with Limited Link Format (defined in the Resource Directory Draft), which is compatible with the resolution steps both of Link headers and RFC6690, and accompanied by a walk-through.
(I do have high hopes for all link-format to be replaced by CoRAL on the long run, but that has not progressed far enough that I would recommend it for implementation anywhere near a production environment.)
Related
I know absolute path-only URLs (/path/to/resource) are valid, and refer to the same scheme, host, port, etc. as the current resource. Is the URL still valid if the same (or a different!) scheme is added? (http:/path/to/resource or https:/path/to/resource)
If it is valid according to the letter of the spec, how well do browsers handle it? How well do developers that may come across the code in the future handle it?
Addendum:
Here's a simple test case I set up on an Apache server:
resource/number/one/index.html:
link
resource/number/two/index.html:
two
Testing in Chrome 43 on OS X: The URL displayed when hovering over the link looks correct. Clicking the link works as expected. Looking at the DOM in the web inspector, hovering over the a href URL displays an incorrect location (/resource/number/one/http:/resource/number/two/).
Firefox 38 appears to also handle the click correctly. Weird.
No, it’s not valid. From RFC 3986:
4.2. Relative Reference
A relative reference takes advantage of the hierarchical syntax
(Section 1.2.3) to express a URI reference relative to the name space
of another hierarchical URI.
relative-ref = relative-part [ "?" query ] [ "#" fragment ]
relative-part = "//" authority path-abempty
/ path-absolute
/ path-noscheme
/ path-empty
The URI referred to by a relative reference, also known as the target
URI, is obtained by applying the reference resolution algorithm of
Section 5.
A relative reference that begins with two slash characters is termed
a network-path reference; such references are rarely used. A
relative reference that begins with a single slash character is
termed an absolute-path reference. A relative reference that does
not begin with a slash character is termed a relative-path reference.
A path segment that contains a colon character (e.g., "this:that")
cannot be used as the first segment of a relative-path reference, as
it would be mistaken for a scheme name. Such a segment must be
preceded by a dot-segment (e.g., "./this:that") to make a relative-
path reference.
where path-noscheme is specifically a path that doesn’t start with / whose first segment does not contain a colon, which addresses your question pretty specifically.
I'm using Qt 4.6.3
When text browser html has a reference
<a href="myprotocol://ABC"/>click me!</a>
then on click, it emits the anchorClicked signal with url
myprotocol://abc
How can it be fixed (I need the correct case...) ?
QUrl always lowercases host names.
QUrl conforms to the URI specification from RFC 3986 (Uniform Resource Identifier: Generic Syntax), and includes scheme extensions from RFC 1738 (Uniform Resource Locators). Case folding rules in QUrl conform to RFC 3491 (Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)).
...
Note that the case folding rules in Nameprep, which QUrl conforms to, require host names to always be converted to lower case, regardless of the Qt::FormattingOptions used.
(From Qt 4.7 documentation, closest I can find to 4.6.3)
If you're using "fake" URLs to just pass some data around your application, you can preserve case by using a dummy hostname and passing your real info as the path or a query. E.g. myprotocol:///ABC (same as writing localhost/ABC).
In this case the ABC interpeted as a host (domain) name of your URL. Although, format does not limit it, and host names are case-insensitive, it is recommended, that URL should be case-sensitive. For example, the W3 states:
URLs in general are case-sensitive (with the exception of machine
names). There may be URLs, or parts of URLs, where case doesn't
matter, but identifying these may not be easy. Users should always
consider that URLs are case-sensitive.
I think, browsers are also follow that rule: all URLs with upper case characters converted into lowercase. I tried this on Chrome, FF and IE.
I have a question regarding URLs:
I've read the RFC 3986 and still have a question about one URL:
If a URI contains an authority component, then the path component
must either be empty or begin with a slash ("/") character. If a URI
does not contain an authority component, then the path cannot begin
with two slash characters ("//"). In addition, a URI reference
(Section 4.1) may be a relative-path reference, in which case the
first path segment cannot contain a colon (":") character. The ABNF
requires five separate rules to disambiguate these cases, only one of
which will match the path substring within a given URI reference. We
use the generic term "path component" to describe the URI substring
matched by the parser to one of these rules.
I know, that //server.com:80/path/info is valid (it is a schema relative URL)
I also know that http://server.com:80/path//info is valid.
But I am not sure whether the following one is valid:
http://server.com:80//path/info
The problem behind my question is, that a cookie is not sent to http://server.com:80//path/info, when created by the URI http://server.com:80/path/info with restriction to /path
See url with multiple forward slashes, does it break anything?, Are there any downsides to using double-slashes in URLs?, What does the double slash mean in URLs? and RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax.
Consensus: browsers will do the request as-is, they will not alter the request. The / character is the path separator, but as path segments are defined as:
path-abempty = *( "/" segment )
segment = *pchar
Means the slash after http://example.com/ can directly be followed by another slash, ad infinitum. Servers might ignore it, but browsers don't, as you have figured out.
The phrase:
If a URI does not contain an authority component, then the path cannot begin
with two slash characters ("//").
Allows for protocol-relative URLs, but specifically states in that case no authority (server.com:80 in your example) may be present.
So: yes, it is valid, no, don't use it.
According to RFC 3986 Section 3 - Syntax Components:
The scheme and path components are required, though the path may be
empty (no characters).
Can someone clarify how the path component can be required if it's able to be empty? Maybe I'm misunderstanding the definition of "required" in this context, but I assumed it to mean something along the lines of "must be non-empty," which obviously conflicts with the spec here.
Here, "required" means merely "always present": the scheme and path
components of an absolute URI are always present.
The scheme component can't be empty because the production
"scheme" requires at least one character.
The path component can be empty because the production
"path-empty" (part of "hier-part") consists of zero characters.
A common practical example of an empty - more precisely, an abempty - path is a URI like http://stackoverflow.com where the path is empty. The authority component (in this case it is stackoverflow.com) alone isn't enough information to identify a resource.
When the authority is empty, the path must begin with a / in order to distinguish the path from the authority - scheme:/// is a valid URI - hence an abempty path. Also take a look at this answer for further reading.
I just learned from a colleague that omitting the "http | https" part of a URL in a link will make that URL use whatever scheme the page it's on uses.
So for example, if my page is accessed at http://www.example.com and I have a link (notice the '//' at the front):
Google
That link will go to http://www.google.com.
But if I access the page at https://www.example.com with the same link, it will go to https://www.google.com
I wanted to look online for more information about this, but I'm having trouble thinking of a good search phrase. If I search for "URLs without HTTP" the pages returned are about urls with this form: "www.example.com", which is not what I'm looking for.
Would you call that a schemeless URL? A protocol-less URL?
Does this work in all browsers? I tested it in FF and IE 8 and it worked in both. Is this part of a standard, or should I test more browsers?
Protocol relative URL
You may receive unusual security warnings in some browsers.
See also, Wikipedia Protocol-relative URLs for a brief definition.
At one time, it was recommended; but going forward, it should be avoided.
See also the Stack Overflow question Why use protocol-relative URLs at all?.
It is called network-path reference (the part that is missing is called scheme or protocol) defined in RFC3986 Section 4.2
4.2 Relative Reference
A relative reference takes advantage of the hierarchical syntax
(Section 1.2.3) to express a URI reference relative to the name space
of another hierarchical URI.
relative-ref = relative-part [ "?" query ] [ "#" fragment ]
relative-part = "//" authority path-abempty
/ path-absolute
/ path-noscheme
/ path-empty
The URI referred to by a relative reference, also known as the target URI, is obtained by applying the reference resolution
algorithm of Section 5.
A relative reference that begins with two slash characters is
termed a network-path reference (emphasis mine); such references are rarely used.
A relative reference that begins with a single slash character is termed an absolute-path reference. A relative reference that does not begin with a slash character is termed a relative-path reference.
A path segment that contains a colon character (e.g., "this:that") cannot be used as the first segment of a relative-path reference, as it would be mistaken for a scheme name. Such a segment must be preceded by a dot-segment (e.g., "./this:that") to make a relative- path reference.