What do we call the combined path, query, and fragment in a URI? - uri

A URI is composed of several parts: scheme:[//[user[:password]#]host[:port]][/path][?query][#fragment]. Often, I find myself wanting to refer to the entire part of the URI to the right of the host and port - the part that would be considered the URI in an HTTP request:
GET /path?query#fragment
Host: example.com
As a short-hand, I normally call this the "path", but that's not quite accurate, as the path is only part of it. This is essentially the inverse of What do you call the entire first part of a URL?
Is there an agreed-upon name for this?

Within a full HTTP URI, there doesn’t seem to be a term that denotes everything coming after the authority.
If you only have the part in question as a URI reference (e.g., in a HTTP GET request), it’s called a relative reference:
relative-ref = relative-part [ "?" query ] [ "#" fragment ]
But this term also includes network-path references (often called protocol-relative URIs), e.g. //example.com/path?query#fragment. To exclude this case, you could use the terms for the other two cases:
absolute-path reference (begins with a single /, e.g. /path?query#fragment)
relative-path reference (doesn’t begin with a /, e.g., path?query#fragment)¹
¹ If the first path segment contains a :, you have to begin the relative-path reference with ./ (e.g., ./pa:th?query#fragment).

RFC 7230 says:
request-line = method SP request-target SP HTTP-version CRLF
I personally prefer to use the terms
origin for scheme and authority (where) and
resource for path, query string and fragment (what).

I'm not aware of any term for that portion of a URI.
RFC3986 says this
foo://example.com:8042/over/there?name=ferret#nose
\_/ \______________/\_________/ \_________/ \__/
| | | | |
scheme authority path query fragment
| _____________________|__
/ \ / \
urn:example:animal:ferret:nose

Related

How to determine if a URI is escaped?

I am using apache commons HTTPClient to download web resources. The URI for these resources come from third parties, I do not generate them.
The commons httpclient requires a URI object to be given to the GetMethod object.
The URI constructor takes a string (for the uri) and a boolean specifying if it is escaped or not.
Currently, I am doing the following to determine if the original url I am given is already escaped...
boolean isEscaped = URIUtil.getPathQuery(originalUrl).contains("%");
m.setURI(new URI(originalUrl, isEscaped));
Is this the correct way to determine if a uri is already escaped?
Update...
according to wikipedia ( Well, according to wikipedia ( http://en.wikipedia.org/wiki/Percent-encoding ) it says that percent is a reserved character and should always be encoded... I am quoting verbatim here...
Percent-encoding the percent character[edit] Because the percent ("%")
character serves as the indicator for percent-encoded octets, it must
be percent-encoded as "%25" for that octet to be used as data within a
URI.
Doesnt this mean that you can never have a naked '%' character in a valid uri?
Also, the uri(s) come from various sources so I cannot be sure if they are escaped or unescaped.
This wouldn't work. It's possible the un-encoded string has a % in it already.
ex:
https://www.google.com/#q=like%25&safe=off
is the url for a google search for like%. In unescaped form it would be https://www.google.com/#q=like%&safe=off
Your consumers should let you know if the URI is escaped or not.

Is a URL with // in the path-section valid?

I have a question regarding URLs:
I've read the RFC 3986 and still have a question about one URL:
If a URI contains an authority component, then the path component
must either be empty or begin with a slash ("/") character. If a URI
does not contain an authority component, then the path cannot begin
with two slash characters ("//"). In addition, a URI reference
(Section 4.1) may be a relative-path reference, in which case the
first path segment cannot contain a colon (":") character. The ABNF
requires five separate rules to disambiguate these cases, only one of
which will match the path substring within a given URI reference. We
use the generic term "path component" to describe the URI substring
matched by the parser to one of these rules.
I know, that //server.com:80/path/info is valid (it is a schema relative URL)
I also know that http://server.com:80/path//info is valid.
But I am not sure whether the following one is valid:
http://server.com:80//path/info
The problem behind my question is, that a cookie is not sent to http://server.com:80//path/info, when created by the URI http://server.com:80/path/info with restriction to /path
See url with multiple forward slashes, does it break anything?, Are there any downsides to using double-slashes in URLs?, What does the double slash mean in URLs? and RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax.
Consensus: browsers will do the request as-is, they will not alter the request. The / character is the path separator, but as path segments are defined as:
path-abempty = *( "/" segment )
segment = *pchar
Means the slash after http://example.com/ can directly be followed by another slash, ad infinitum. Servers might ignore it, but browsers don't, as you have figured out.
The phrase:
If a URI does not contain an authority component, then the path cannot begin
with two slash characters ("//").
Allows for protocol-relative URLs, but specifically states in that case no authority (server.com:80 in your example) may be present.
So: yes, it is valid, no, don't use it.

What is the name of this URL part?

If I have a URL:
http://mysite.com/part1/page.aspx
What is the name of the part1 part in that URL?
Justin is correct - it's one part of the path. Adding a reference to the URI RFC, for the pretty picture contained therein, which illustrates (in section 3) each part of an example URI:
foo://example.com:8042/over/there?name=ferret#nose
\_/ \______________/\_________/ \_________/ \__/
| | | | |
scheme authority path query fragment
Further on, in the section devoted to the path:
A path consists of a sequence of path segments separated by a slash
("/") character. A path is always defined for a URI, though the
defined path may be empty (zero length). Use of the slash character
to indicate hierarchy is only required when a URI will be used as the
context for relative references. For example, the URI
mailto:fred#example.com has a path of "fred#example.com", whereas
the URI foo://info.example.com?fred has an empty path. (emphasis mine)
And so "path segment" might be the term you're looking for.
That would be the first part of the path. There's no term to describe that part alone.
I might possibly be name "the protocol"?

URL without "http|https"

I just learned from a colleague that omitting the "http | https" part of a URL in a link will make that URL use whatever scheme the page it's on uses.
So for example, if my page is accessed at http://www.example.com and I have a link (notice the '//' at the front):
Google
That link will go to http://www.google.com.
But if I access the page at https://www.example.com with the same link, it will go to https://www.google.com
I wanted to look online for more information about this, but I'm having trouble thinking of a good search phrase. If I search for "URLs without HTTP" the pages returned are about urls with this form: "www.example.com", which is not what I'm looking for.
Would you call that a schemeless URL? A protocol-less URL?
Does this work in all browsers? I tested it in FF and IE 8 and it worked in both. Is this part of a standard, or should I test more browsers?
Protocol relative URL
You may receive unusual security warnings in some browsers.
See also, Wikipedia Protocol-relative URLs for a brief definition.
At one time, it was recommended; but going forward, it should be avoided.
See also the Stack Overflow question Why use protocol-relative URLs at all?.
It is called network-path reference (the part that is missing is called scheme or protocol) defined in RFC3986 Section 4.2
4.2 Relative Reference
A relative reference takes advantage of the hierarchical syntax
(Section 1.2.3) to express a URI reference relative to the name space
of another hierarchical URI.
relative-ref = relative-part [ "?" query ] [ "#" fragment ]
relative-part = "//" authority path-abempty
/ path-absolute
/ path-noscheme
/ path-empty
The URI referred to by a relative reference, also known as the target URI, is obtained by applying the reference resolution
algorithm of Section 5.
A relative reference that begins with two slash characters is
termed a network-path reference (emphasis mine); such references are rarely used.
A relative reference that begins with a single slash character is termed an absolute-path reference. A relative reference that does not begin with a slash character is termed a relative-path reference.
A path segment that contains a colon character (e.g., "this:that") cannot be used as the first segment of a relative-path reference, as it would be mistaken for a scheme name. Such a segment must be preceded by a dot-segment (e.g., "./this:that") to make a relative- path reference.

What is the correct terminology for breaking up a URI into its component parts?

Suppose we have a string "http://www.example.com/feed". I am breaking this string up into three pieces for use with Apache's URI class:
1. "http"
2. "www.example.com"
3. "/feed"
Is there a proper term for this process of breaking down a URI into its component pieces?
A uri can be parsed into it's component parts:
The following are two examples from the RFC3986:
foo://example.com:8042/over/there?name=ferret#nose
\_/ \______________/\_________/ \_________/ \__/
| | | | |
scheme authority path query fragment
| _____________________|__
/ \ / \
urn:example:animal:ferret:nose
A uri can be either a url or a urn.
Split or parse? I think it's really semantics, and there's not an agreed upon term.
I would always use the term parsing.

Resources