RFC 7239 - Difference between 'by' and 'host' - http

Im in the process of implementing a RFC 7239 compatible forwarded header in an internal reverse proxy. In brief, the specification states that the values of the host and by components should be as follows:
host - the value of the host header as received by the proxy
by - the user agent facing interface of the proxy
Whats the difference between these two? If i have a proxy server facing the internet on the address http://myexampleserver.com, as i understand it both host and by would have the same value?

No, the original request will contain the requested site's DNS address as host header, not the proxy's DNS address.
The RFC mentions this host value's intended use:
This can be used, for example, by the origin
server if a reverse proxy is rewriting the "Host" header field to
some internal host name.
So, for example:
User agent requests http://example.com/foo through proxy http://yourexampleproxy/. Request will contain GET / http://example.com/foo and Host: example.com.
Your proxy translates, by configuration, the Host: example.com header to Host: some-internal-foo, and adds the Forwarded: host=example.com;by=yourexampleproxy header, so the origin server can inspect it.

Related

Usage of 'Host' Header in Web Requests

I am looking at the http-requests in BurpSuite. I see a field named as 'Host'. What is the importance of this field?
What happens if I change this field and then send the request? If I change the host header field to some other IP then would the server respond back to this new modified IP?
A single web server can host multiple websites with different domains and subdomains.
The Host header allows it to distinguish between them.
Given the limited availability of IPv4 addresses, this is important as there are more websites than available IP addresses.
What happens if I change this field and then send the request?
If the server pays attention to it and recognises the hostname, it will respond with that website (otherwise it may fall back to its default website or throw an error).
For an example, see Name-based Virtual Host Support in the Apache HTTPD manual.
If I change the host header field to some other IP then would the server respond back to this new modified IP?
No. The Host header is the host the client is asking for. It has nothing to do with where the response should be sent.
To quote from https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Host :
The Host request header specifies the host and port number of the server to which the request is being sent.
If no port is included, the default port for the service requested (e.g., 443 for an HTTPS URL, and 80 for an HTTP URL) is implied.
A Host header field must be sent in all HTTP/1.1 request messages. A 400 (Bad Request) status code may be sent to any HTTP/1.1 request message that lacks a Host header field or that contains more than one.

Why is routing a request based on its Host header a good way to proxy traffic?

In the proxy documentation for Kong, it is mentioned that
Routing a request based on its Host header is the most straightforward
way to proxy traffic through Kong, especially since this is the
intended usage of the HTTP Host header
However, for this to work, any incoming request from a client must now have its Host header set to a particular value. In general, HTTP clients don't intentionally modify this value, so how is this used in practice?
In other words, clients aren't in general modifying the HTTP host header in their request, as is done in the curl examples in the docs, e.g.:
curl --url http://proxy.mydomain.com:8000/ --header 'Host: service.example.com'
Given that the proxy is intended to be transparent to clients, why is it the case that 'this is the intended usage of the HTTP Host header'?
If the proxy is transparent to the client, the client usually doesn't know that a proxy is used and therefore resolves the IP Address via DNS. The client then establishes a TCP connection to the IP address accordingly.
The (transparent) proxy now intercepts the traffic. The Host header is now the only chance to get the servers FQDN. This is important if connection is HTTPS so proxy can use the Host header value as SNI / Verify the Server's certificate.
Independent of the use of a transparent proxy, the host header should contain the server name which allows hosting multiple webpages using the same server HW.
Example:
Server IP 1.2.3.4 with 4 websites: www.a.com, www.b.com, www.c.com, www.d.com.
The client must provide the value of the website in the host header in order to allow the server to distinguish between the different websites.

Why does the host address is included in HTTP 1.1 GET command?

GET /calcuapp/calculator.jsp HTTP/1.1
Host: 192.168.1.66:8080
I'm using PuTTy and the host destination is already set up on the settings. Why do I need again to type the host destination as you can see above?
The short answer is Virtual Hosts.
For many years now, it has been quite common to host multiple sites/domains from a single server. HTTP 1.1 supports this by requiring the host header. If you use HTTP 1.0 you may leave this out.
The Host HTTP header is mandatory since HTTP/1.1 and it's used for virtual hosting.
It must include the domain name of the server, and the TCP port number on which the server is listening. The port number may be omitted if the port is the standard port for the service requested (80 for HTTP and 443 for HTTPS).
A HTTP/1.1 request that lacks the Host header should be responded with a 400 (Bad Request) status code.
The RFC 7230, the current reference message syntax and routing in HTTP/1.1, tells the whole story about this header:
5.4. Host
The Host header field in a request provides the host and port
information from the target URI, enabling the origin server to
distinguish among resources while servicing requests for multiple
host names on a single IP address.
Host = uri-host [ ":" port ]
A client MUST send a Host header field in all HTTP/1.1 request
messages. If the target URI includes an authority component, then a
client MUST send a field-value for Host that is identical to that
authority component, excluding any userinfo subcomponent and its #
delimiter. If the authority component is missing or
undefined for the target URI, then a client MUST send a Host header
field with an empty field-value.
Since the Host field-value is critical information for handling a
request, a user agent SHOULD generate Host as the first header field
following the request-line.
For example, a GET request to the origin server for
http://www.example.org/pub/WWW/ would begin with:
GET /pub/WWW/ HTTP/1.1
Host: www.example.org
A client MUST send a Host header field in an HTTP/1.1 request even if
the request-target is in the absolute-form, since this allows the
Host information to be forwarded through ancient HTTP/1.0 proxies
that might not have implemented Host.
When a proxy receives a request with an absolute-form of
request-target, the proxy MUST ignore the received Host header field
(if any) and instead replace it with the host information of the
request-target. A proxy that forwards such a request MUST generate a
new Host field-value based on the received request-target rather than
forward the received Host field-value.
Since the Host header field acts as an application-level routing
mechanism, it is a frequent target for malware seeking to poison a
shared cache or redirect a request to an unintended server. An
interception proxy is particularly vulnerable if it relies on the
Host field-value for redirecting requests to internal servers, or for
use as a cache key in a shared cache, without first verifying that
the intercepted connection is targeting a valid IP address for that
host.
A server MUST respond with a 400 (Bad Request) status code to any
HTTP/1.1 request message that lacks a Host header field and to any
request message that contains more than one Host header field or a
Host header field with an invalid field-value.
Your local resolver (DNS etc) converts the host name on the command line to an IP address before connecting; there is no way for the remote server to know which host name you gave on the command line if there are multiple host names which resolve to the same IP address (this is what's called "virtual hosting"; with HTTP 1.0, you needed a separate IP address for each distinct HTTP host, which is extremely wasteful, but saves you from needing to transmit the Host: header).

Haproxy Appending Port to `HTTP_HOST` Header in Backend Request

I am using haproxy in front of my web-server for ssl termination.
I am forwarding request on port 81 if request is https and 80 if request is normal http-
backend b1_http
mode http
server bkend_server
backend b1_https
mode http
server bkend_server:81
Problem is, when haproxy sends request to back-end, it sends HTTP_HOST header as request.domain.com:81.
Is it possible in haproxy that I can send https request to back-end at specific port without appending the port in HTTP_HOST request header?
There are two issues, here.
First, there is no HTTP_HOST header. The header is Host:. It sounds like HTTP_HOST is something being generated internally by your web server or framework.
Second, HAProxy doesn't modify the Host: header just because your back-end is listening on a port other than 80. It doesn't actually modify the Host: header at all, unless explicit configured to, using a mechanism like reqirep ^Host: ... or http-request set-header host ....
You can confirm this with a packet capture. You should find that whatever HTTP_HOST is, the value is necessarily being generated internally on the back-end system itself, because it's not coming from HAProxy.

Is Port Number Required in HTTP "Host" Header Parameter?

Say I make an HTTP request to: foosite.com but the port I actually send the request to is 6103 and I DON'T put that port in the Host header for example:
GET /barpage HTTP/1.1
Host: foosite.com
Method: GET
Should http server then recognize that I'm trying to talk to it on port 6103? Or since it was omitted in the request header am I gambling on if the server actually recognizes this?
I ask that question to say this: I've found that browsers, at least firefox + chrome, put the port in the Host header. But the Java app I'm using does not. And when the port is not passed in the Host the server responds back thinking I'm on port 80. So who do I need to badger? The server operator, or the Java programmer?
See section 14.23 of the HTTP spec which specifies that the port # should be included if it's not the default port (80 for HTTP, 443 for HTTPS).
UPDATED for modern day browsers:
Browsers (and curl) will add the port only when it is not the standard port, as required by the HTTP spec and noted in #superfell's answer.
Browsers this day (2013), will actually strip the port from the Host Header when the port is the standard (http port 80, https port 443). Some clients, which use their own method, like the Baidu Spider, include the port number even when the port is 80.
Whether this is proper or not, I don't know. The spec doesn't say whether it's OK or not to include the port number when the port used IS the default.
To answer your comment, servers will do whatever they need to do to comply with the spec, and the spec suggests only the cases WHEN it's needed. Because of this, I feel It's not really a question of how the server deals with it - it's more how the client issues the request: includes the port number in the Host Header, or not.
RFC2616 states that
A "host" without any trailing port information implies the default
port for the service requested (e.g., "80" for an HTTP URL). For
example, a request on the origin server for
http://www.w3.org/pub/WWW/ would properly include:
GET /pub/WWW/ HTTP/1.1
Host: www.w3.org
This means that https://example.com would not need a trailing port as well since the default port is known for https. I have checked the HTTP requests from Firefox, Chrome and Edge and found that none of them added the port number for the host header when the domain protocole was https. For sure the port number is added when the port number was also added to the URL. The following screenshots below come from Google chrome
Sample headers of an actual request to a hopefully non existent server 'http://myhost.com:3003/content/page.htm'
Accept: */*
Accept-Encoding: gzip, deflate
Accept-Language: en-US;q=0.9,en;q=0.8,nb;q=0.7,de;q=0.6
Connection: keep-alive
Host: myhost.com:3244
Referer: http://myhost.com:3244/content/page.htm
The RFC https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html requires some training to read.
Section 14:24 not so easy to translate all elements to the simple reality:
Host = "Host" ":" host [ ":" port ] ;
Host Header Syntax:
Host: :
if its not default than put port after host:
Host: example.com:1337

Resources