I have two URLs:
http://example.com/foo
and
http://example.com/foo/
Are they different URLs or the same? The same question is and about FTP protocol (ftp://example.com/foo[/])
In the URI standard, the relevant section is Normalization and Comparison:
After doing a simple string comparison, these URIs are not equivalent.
After applying syntax-based normalization, these URIs are not equivalent.
For scheme-based normalization, you have to refer to the specifications of the http/https and ftp URI schemes, and check if any scheme-specific rules are defined:
For http/https, these rules are in the section http and https URI Normalization and Comparison, and there don’t seem to be any for your case.
For ftp, there don’t seem to be defined any normalization/comparison rules.
For protocol-based normalization, you have to take something like redirects into account (in case of http).
tl;dr: The URIs are not equivalent.
Note that this is not the case for an empty path in HTTP(S) URIs, as the section linked above defines:
[…] an empty path component is equivalent to an absolute path of "/" […]
So the following URIs are equivalent:
http://example.com/
http://example.com
By the way, for the protocol-based normalization, the standard gives your case as an example:
[…] For example, if they observe that a URI such as
http://example.com/data
redirects to a URI differing only in the trailing slash
http://example.com/data/
they will likely regard the two as equivalent in the future. […]
Yes, they are different resource.
It's particularly important in HTML.
If you have a relative link bar (blah):
In the https://www.example.com/foo, that link resolves to the https://www.example.com/bar
While in the https://www.example.com/foo/, that link resolves to https://www.example.com/foo/bar
But HTTP servers will usually redirect the https://www.example.com/foo to the https://www.example.com/foo/, when foo is a folder, to avoid this confusion.
With the FTP protocol, it's probably client-specific, as the FTP protocol itself does not work with URLs.
So it depends on the FTP client how it behaves, if you use the https://www.example.com/foo, when the foo is actually a folder. The "FTP client" in this case typically means a web browser, as these work with URLs. Dedicated FTP clients usually do not work with URLs either.
Related
I need help in rewriting the URL in nginx configuration which should work as below :
/products/#details to /produce/#items
but it is not working as # is creating a problem.
Note : # in the URL denotes the page section
e.g. www.test.com/products/#details should get redirected to www.test.com/produce/#items
This is impossible using nginx because browsers don't send hashtags (#details) to servers. So you cannot rewrite in nginx or any other web servers.
In other words, hashtags is available to the browser only, so you have to deal it with Javascript. The server can not read it.
https://www.rfc-editor.org/rfc/rfc2396#section-4
When a URI reference is used to perform a retrieval action on the identified resource, the optional fragment identifier, separated from the URI by a crosshatch ("#") character, consists of additional reference information to be interpreted by the user agent after the retrieval action has been successfully completed. As such, it is not part of a URI, but is often used in conjunction with a URI.
There is no way to do this rewrite. The # and everything that precedes it will not be sent to the server, it is completely handled on the client side.
Which of them is correct definition of URL rewriting?
Shortening of URL for end-user as elaborated here
Appending extra arguments to URL sent to server for session management
I am confused over which one was invented first, and which one should be correct definition of URL re-write?
URL rewriting is simply what its name implies: rewriting/modification of URLs.
This can be used for shortening URLs, and it can be used for appending query parameters, but it’s not restricted to these two use cases.
To make the URL more readable
Example www.abc.com?id=1 rewrite it to www.abc.com/1
When comparing two URIs to decide if they match or not, a client
SHOULD use a case-sensitive octet-by-octet comparison of the entire
URIs, with these exceptions:
I read above Sentence in Http Rfc I think Url is case-insensitive but i dont undrestand what that means
?
RFC 3986 states:
the scheme and host are case-insensitive and therefore should be normalized to lowercase. For example, the URI <HTTP://www.EXAMPLE.com/> is equivalent to <http://www.example.com/>. The other generic syntax components are assumed to be case-sensitive unless specifically defined otherwise by the scheme
RFC 2616 defines the following comparison rule for the HTTP scheme:
When comparing two URIs to decide if they match or not, a client SHOULD use a case-sensitive octet-by-octet comparison of the entire URIs, with these exceptions:
However, RFC 7230 locks it down further by stating:
The scheme and host are case-insensitive and normally provided in lowercase; all other components are compared in a case-sensitive manner.
Those rules typically apply to client side comparisons. There are no rules specifically geared for server side comparisons. Once a server breaks up a URI into its components, it should treat them according to the same rules, but I don't see that enforced in the RFCs. Some web servers, like Apache, do follow the rules. IIS doesn't, for compatibility with Windows' case-insensitive file system.
In reality it depends on the web server.
IIS is not case sensitive.
Apache is.
I suspect that the decision regarding IIS is rooted in the fact that the Windows file system is not case sensitive.
IIS still meets that portion of the spec because SHOULD is a recommendation, not a requirement.
The host portion of the URI is not case sensitive:
http://stackoverflow.com
http://StackOverflow.com
Either of the above will get you to this site.
The rest of the URI after the host portion can be case sensitive. It depends on the server.
As mentioned in answer by Remy Lebeau, the rules are set for client side. Actually, this means that client software should not try to make arbitrary case modifications to all parts of URIs, except for specifically stated parts. So, when a browser e.g. sees a relative URL in a page anchor, is should not convert it to lowercase before checking if it is already cached in its cache; neither should it use the URI lowercased to post to server. Also, it should not decide that two URIs that differ in case only point to same resource (thus possibly wrongly skipping a transaction and returning cached result instead).
This means that client should not assume how servers treat the URIs. It does require servers to treat some parts case-insensitive: e.g., scheme and host. But otherwise, it's up to server to decide if two URIs that differ in case point to the same resource, or not. Standard does not impose any restrictions on servers in this regards, there's nothing "server should" or "server should not" besides directly prescribed. If server decides that its URIs are case-insensitive, that's absolutely fine. If they are case-sensitive, that's fine, too.
Whether or not URLs are treated as case-sensitive also depends on the web server. For example, Microsoft IIS servers do not treat URLs as case-sensitive.
The following URLs (hosted on a Microsoft IIS server) are both treated as equivalent:
http://www.microsoft.com/default.aspx
http://www.microsoft.com/Default.aspx
However, Apache servers do treat URLs as case-sensitive are classed as two different resources:
http://httpd.apache.org/index.html
http://httpd.apache.org/Index.html
Technically, Apache is following the standards correctly here, and Microsoft is going against the specification… Oh well – “old habits die hard,” they say!
For a file-based URI, case-sensitivity depends more on the underlying file system, not so much the web server. Apache will happily return index.html for INDEX.html on Windows (FAT, NTFS) and mac (HFS), but not for case-sensitive file systems such as those usually used in Linux (extx and so forth).
As far as I understand, an URL consists of the folowing fields:
Protocol (http, https, ftp, etc.)
User name
User Password
Host address (an IP address or a DNS FQDN)
Port (which can be implied)
Path to a document inside the server documents root
Set of arguments and values
Document part (#)
as
protocol://user:password#host:port/path/document?arg1=val1&arg2=val2#part
But I've just met an example of using "http://" inside the path part: there is a redirection service (showing ads and paying money for traffic you route through it) which just adds a target URL (in full form, with "http://") to its own. Is it considered ok from standards point of view? Doesn't it break anything? Normally I'd never expect to meet "//" double slash, a colon or a "#" inside a valid URL but on the places they are in the example above.
No, it is not okay from a standards perspective.
Per Section 3.3 Path Component in RFC-2396, path cannot contain the following characters - "/", ";", "=", and "?"
Usually, browsers encode such malformed URIs before making the http request, which is why it works in practice.
I want to have requests for the www subdomain or for alternate top-level domains redirected to one canonical URL.
To avoid HTTP/HTTPS issues, I figured the easiest way would be to just send a scheme-relative URI in the Location header, like so:
HTTP/1.1 301 Moved Permanently
Location: //example.com/
This seems to work fine in browsers, but the toy »validator« at http://no-www.org/ does not handle it correctly. Is this just a single badly written script, or is this behavior actually more common in scripts, crawlers, etc. out there?
Location expects an absolute URI:
[…] The field value consists of a single absolute URI.
Location = "Location" ":" absoluteURI
Although most user agents will also accept relative URIs, you should stick to the specification and provide an absolute URI.