Do web browsers always send a trailing slash after a domain name? - http

Is there consistency and/or a standard on how browsers send a url to a host related to trailing slashes?
Meaning, if I type in http://example.com in the address bar of a web browser, is the browser suppose to add a trailing slash (http://example.com/) or not?

The HTTP request sent from the browser to the server does not include the domain name, only the "path" portion (starting from the first slash after the domain name). Since the path cannot be empty, a / is sent in that case.
A sample GET request for the root of a web site might be:
GET / HTTP/1.0
The / above cannot be omitted.

As RFC 2616 tells:
3.2.2 http URL
The "http" scheme is used to locate
network resources via the HTTP
protocol. This section defines the
scheme-specific syntax and
semantics for http URLs.
http_URL = "http:" "//" host [ ":"
port ] [ abs_path [ "?" query ]]
If the port is empty or not given,
port 80 is assumed. The semantics
are that the identified resource is
located at the server listening for
TCP connections on that port of that
host, and the Request-URI for the
resource is abs_path (section 5.1.2).
The use of IP addresses in URLs
SHOULD be avoided whenever possible
(see RFC 1900 [24]). If the
abs_path is not present in the URL, it
MUST be given as "/" when used as a
Request-URI for a resource (section
5.1.2). If a proxy receives a host name which is not a fully qualified
domain name, it MAY add its domain
to the host name it received. If a
proxy receives a fully qualified
domain name, the proxy MUST NOT change
the host name.
Read more: http://www.faqs.org/rfcs/rfc2616.html#ixzz0kGbpjYWa
5.1.2 Request-URI
...
For example, a client wishing to retrieve the
resource above directly from the
origin server would create a TCP
connection to port 80 of the host
"www.w3.org" and send the lines:
GET /pub/WWW/TheProject.html HTTP/1.1
Host: www.w3.org
followed by the remainder of the
Request. Note that the absolute path
cannot be empty; if none is present in
the original URI, it MUST be given
as "/" (the server root).
Read more: http://www.faqs.org/rfcs/rfc2616.html#ixzz0kGcaRbqU

Note that it's a very different matter when the URL has a path element:
http://example.com/dir
is a different URL than
http://example.com/dir/
and could in fact contain different content, and have a different search engine ranking.

As far as the protocol is concerned, http://example.com/something and http://example.com/something/ are quite different. Some servers might redirect you from one to the other if it is implemented in such a way.
As for the pure domain names, it always sends a request ending with a slash.
(The domain name itself is not included in the path section of an HTTP request, just as Greg Hewgill and the others wrote. It is, however, included in the headers.)
You can check it with a tool like Fiddler or WireShark.

Related

URLs and HTTP protocol

I am currently learning about how to transfer messages via URL to a host server. What I have learned so far is how a URL is composed: http://example.com:80/latest/example.jpg?d=400x400 gives me the image example.jpg in the dimension requested from the host via port 80 (which can be left out as HTTP always uses port 80). The request message for this would look like this:
GET latest/example.jpg?d=400x400 HTTP/1.1. The response message would look like this: HTTP/1.1 200 OK.
So it is clear to me how to GET some resource from a Host. But what's with the other HTTP methods like PUT, POST or DELETE? I don't understand where in the URL the HTTP method is carried for the host to read. How do I tell the host to PUT instead of GET?
There seems to be a small misconception about urls and the corresponding requests.
The url http://example.com:80/latest/example.jpg?d=400x400 is composed of 5 pieces:
The used protocol (in your case http)
The use fqdn - fully qualified domain name - (in your case example.com)
The port on the fqdn - in your case 80 - which is in your case unnecessary because your browser will default to 80 for http
your requested resource, in your case /latest/example.jpg
your requested query string parameters, indicated by ?, in your case the parameter d with the value 400x400
Note that the request message only looks like you outlined, because your browser defaults to the GET method of HTTP. As you correctly stated, there are various HTTP methods, such as PUT, POST, PATCH, DELETE, etc.
The HTTP-Method is stated in the HTTP Header, so it's up to the request which HTTP-Method is invoked.
For the "well-known" internet surfing, your typed url will always result in a GET request. For the other HTTP methods, it's up to the application (e.g. your Website or your normal software that uses HTTP requests) to enable the use. As an example, html enables the use of <form> tags where you can specify the http method, e.g. you can say to use POST.
To sum it up: Your url does not specify the HTTP-Methods.
Browsers default to GET, but in the end it's up to your application (and thus the logic behind it) which HTTP-method is used.

What is the correct way to render absolute URLs behind a reverse proxy?

I have a web application running on a server (let's say on localhost:8000) behind a reverse proxy on that same server (on myserver.example:80). Because of the way the reverse proxy works, the application sees an incoming request targeted at localhost:8000 and the framework I'm using therefore tries to generate absolute URLs that look like localhost:8000/some/ressource instead of myserver.example/some/ressource.
What would be "the correct way" of generating an absolute URL (namely, determining what hostname to use) from behind a proxy server like that? The specific proxy server, framework and language don't matter, I mean this more in an HTTP sense.
From my initial research:
RFC7230 explicitly says that proxies MUST change the Host header when passing the request along to make it look like the request came from them, so it would look like using Host to determine what hostname to use for the URL, yet in most places where I have looked, the general advice seems to be to configure your reverse proxy to not change the Host header (counter to the spec) when passing the request along.
RFC7230 also says that "request URI reconstruction" should use the following fields in order to find what "authority component" to use, though that seems to also only apply from the point-of-view of the agent that emitted that request, such as the proxy:
Fixed URI authority component from the server or outbound gateway config
The authority component from the request's firsr line if it's a complete URI instead of a path
The Host header if it's present and not empty
The listening address or hostname, alongside with the incoming port number if it's not the default one for the protocol
HTTP 1.0 didn't have a Host header at all, and that header was added for routing purposes, not for URL authority resolution.
There are headers that are made specifically to let proxies to send the old value of Host after routing, such as Via, Forwarded and the unofficial X-Forwarded-Host, which some servers and frameworks will check, but not all, and it's unclear which one should even take priority given how there's 3 of them.
EDIT: I also don't know whether HTTPS would work differently in that regard, given that the headers are part of the encrypted payload and routing has to be performed another way because of this.
In general I find it’s best to set the real host and port explicitly in the application rather than try to guess these from the incoming request.
So for example Jira allows you to set the Base URL through which Jira will be accessed (which may be different to the one that it is actually run as). This means you can have Jira running on port 8080 and have Apache or Nginx in front of it (on the same or even a different server) on port 80 and 443.

Are http and https resources equivalent?

Are HTTP and https resources equivalent? That is, does http://example.com/ABC refer to the same resource as https://example.com/ABC?
Evidence for: (1) Cookies with matching domain and path without "secure" attribute are set and returned independent of protocol. (2) HTTP strict transport security bounces you from HTTP to HTTPS with an implicit assumption the resource is the same.
Evidence against: (1) Same origin policy treats a different protocol as a different origin. (2) HTTP RFC shows HTTP, and https comparison is unequal. (3) Resources for other protocols like FTP aren't equivalent to HTTP resources for the same domain (e.g., FTP server root dir different), so what magic does https have over FTP in resource equivalence to HTTP?
I am going to say - Yes - they are the same resources.
The protocol only depicts the transportation layer.
To me
http://example.com/ABC
reads like following:
At example.com a commercial domain I have a resource called ABC.
I read the same for the following irrespective of protocol.
https://example.com/ABC
However web servers can be configured to represent and entirely different contents at the same ABC resource path based on https but in my mind they should not do so.
However the only caveat is if anyone wants to return some sort of warning for using plain HTTP we now have a different meaning but it should return 500 or some error condition for doing so.
The answer is, it depends on the web server configuration. They can and in a lot of cases do point to the same resources, because HTTP and HTTPS tends to be bound to the same single site/application.
However, because they are accessed over different TCP ports (HTTP port 80, HTTPS port 443), it is perfectly possible to have the HTTP resource be served up by a different bound site than the HTTPS resource with the same URI (except protocol) and therefore be totally different.

Do resources in a URL path have their own IP address?

So, a DNS server recognizes https://www.google.com as 173.194.34.5
What does, say, https://www.google.com/images/srpr/logo11w.png look like to a server? Or are URL strings machine readable?
Good question!
When you access a url, first a DNS lookup will be done on the host part (www.google.com), after that the browser will look at the protocol and connect using that (https in this case).
After connecting, the browser will tell the server:
"Hi! I'm trying to connect to www.google.com and I would like the resource /images/srpr/logo11w.png). This looks like this on the protocol:
GET /images/srpr/logo11w.png HTTP/1.1
Host: www.google.com
The Host part is a HTTP header. There are usually more headers.
So the short answer is:
The server will get access to both the hostname, and the full path the browser tried to access.
https://www.google.com/images/srpr/logo11w.png
consists of several parts
protocol (https)
address of the server (www.google.com, that gets translated to IP)
path to the resource (/images/srpr/logo11w.png, in this example it seems like it would be an image in a directory srpr, which is in a directory images in the root of the website)
The server processes path to the resource the user requested (via GET method) based on various rules and returns a response.

About the http request standard

GET http://stackoverflow.com/questions HTTP/1.1
Host: stackoverflow.com
Does the HTTP standard require that GET requests are fed with an absolute or relative address? What about when the request is in a proxy?
I ask this because I feel it's duplicate with the Host info.
GET / HTTP/1.1
Is a valid request line. The full path is not necessary.
5.1.2 Request-URI
The Request-URI is a Uniform Resource
Identifier (section 3.2) and
identifies the resource upon which to
apply the request.
Request-URI = "*" | absoluteURI | abs_path | authority
The four options for Request-URI are
dependent on the nature of the
request. The asterisk "*" means that
the request does not apply to a
particular resource, but to the server
itself, and is only allowed when the
method used does not necessarily apply
to a resource. One example would be
OPTIONS * HTTP/1.1
The absoluteURI form is REQUIRED when
the request is being made to a proxy.
The proxy is requested to forward the
request or service it from a valid
cache, and return the response. Note
that the proxy MAY forward the request
on to another proxy or directly to the
server specified by the absoluteURI. In order
to avoid request loops, a proxy MUST
be able to recognize all of its server
names, including any aliases, local
variations, and the numeric IP
address. An example Request-Line would
be:
GET http://www.w3.org/pub/WWW/TheProject.html HTTP/1.1
To allow for transition to
absoluteURIs in all requests in future
versions of HTTP, all HTTP/1.1 servers
MUST accept the absoluteURI form in
requests, even though HTTP/1.1 clients
will only generate them in requests to
proxies.
You can consult the HTTP RFC for this.
3.2.1 General Syntax
URIs in HTTP can be represented in absolute form or relative to some
known base URI [11], depending upon the context of their use.
Host details are not required. Relative path is sufficient

Resources