Why is routing a request based on its Host header a good way to proxy traffic? - http

In the proxy documentation for Kong, it is mentioned that
Routing a request based on its Host header is the most straightforward
way to proxy traffic through Kong, especially since this is the
intended usage of the HTTP Host header
However, for this to work, any incoming request from a client must now have its Host header set to a particular value. In general, HTTP clients don't intentionally modify this value, so how is this used in practice?
In other words, clients aren't in general modifying the HTTP host header in their request, as is done in the curl examples in the docs, e.g.:
curl --url http://proxy.mydomain.com:8000/ --header 'Host: service.example.com'
Given that the proxy is intended to be transparent to clients, why is it the case that 'this is the intended usage of the HTTP Host header'?

If the proxy is transparent to the client, the client usually doesn't know that a proxy is used and therefore resolves the IP Address via DNS. The client then establishes a TCP connection to the IP address accordingly.
The (transparent) proxy now intercepts the traffic. The Host header is now the only chance to get the servers FQDN. This is important if connection is HTTPS so proxy can use the Host header value as SNI / Verify the Server's certificate.
Independent of the use of a transparent proxy, the host header should contain the server name which allows hosting multiple webpages using the same server HW.
Example:
Server IP 1.2.3.4 with 4 websites: www.a.com, www.b.com, www.c.com, www.d.com.
The client must provide the value of the website in the host header in order to allow the server to distinguish between the different websites.

Related

Usage of 'Host' Header in Web Requests

I am looking at the http-requests in BurpSuite. I see a field named as 'Host'. What is the importance of this field?
What happens if I change this field and then send the request? If I change the host header field to some other IP then would the server respond back to this new modified IP?
A single web server can host multiple websites with different domains and subdomains.
The Host header allows it to distinguish between them.
Given the limited availability of IPv4 addresses, this is important as there are more websites than available IP addresses.
What happens if I change this field and then send the request?
If the server pays attention to it and recognises the hostname, it will respond with that website (otherwise it may fall back to its default website or throw an error).
For an example, see Name-based Virtual Host Support in the Apache HTTPD manual.
If I change the host header field to some other IP then would the server respond back to this new modified IP?
No. The Host header is the host the client is asking for. It has nothing to do with where the response should be sent.
To quote from https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Host :
The Host request header specifies the host and port number of the server to which the request is being sent.
If no port is included, the default port for the service requested (e.g., 443 for an HTTPS URL, and 80 for an HTTP URL) is implied.
A Host header field must be sent in all HTTP/1.1 request messages. A 400 (Bad Request) status code may be sent to any HTTP/1.1 request message that lacks a Host header field or that contains more than one.

Faking an HTTP request header

I have a general networking question but it's related with security aspect.
Here is my case: I have a host which is infected by a malware. The malware creates an http packet to communicate with it's command and control server. While constructing the packet, the IP layer contains the correct IP address of the command and control server. The tcp layer contains the correct port number 80.
Before sending the packet out, the malware modifies the http header to replace the host header with “google.com" instead of it's server address. It then attaches the stolen data with the packet and sends it out.
My understanding is that the packet will get delivered to the correct server because the routing will happen based on the IP.
But can I host a webserver on this IP that would receive all packets with header host google.com and parse it correctly?
Based on my reading on the internet, it is possible but if it is that easy then why have malware authors not adopted this technique to spoof the http headers and bypass traditional domain whitelisting engines.
When you make a request to let's say Apache2 server, what actually Apache does is match your "Host" header with any VirtualHost within server's configuration. Only if it cannot be found / is invalid, Apache will route the request to default virtualhost if it's defined. Basically nothing stops you from changing these headers.
You can simply test it by editing your hosts file and pointing google.com to any other IP - you will be able to handle the google.com domain on your server, but only you will be to use it this way - no one else.
Anything you send inside HTTP headers shouldn't be trusted - it just a guide for your server on how to actually handle the traffic.
The fake host header is just there to trick some deep-inspection firewalls ("it's for Google? you may pass..."). The server on that IP either doesn't care about the host header (default vhost) or is explicitly configured to accept it.
Passing the loot on by using fake headers or just as plain data behind the headers is another trick to fool data loss prevention.
These methods can mislead shallow application-layer inspection but won't pass a decent firewall.

How does a HTTP Reverse Proxy work

I saw a website that offers HTTP Reverse proxy and TCP Reverse Proxy, I know what TCP Reverse Proxy is, but I have no clue what the difference is between HTTP & TCP Reverse proxy.
Try to explain it to someone that would like to buy it, because i want to know how it works. Those website's that offers it usually says "1 Protected Domain".
Contrary to a simple TCP reverse proxy a HTTP proxy can choose the target HTTP server based on the content of the HTTP request, i.e. based on the target host name (Host header), the path, Cookies, the User-Agent etc. It can also include additional headers like X-Forwarded-For to include the clients original IP address.

Should x-forwarded-for contain a proxy in https traffic?

I have a web server cluster behind a proxy/load balancer. That proxy contains my SSL certs and hands the web servers the decrypted traffic, and along the way adds an "x-forwarded-for" header into the HTTP header the web application receives. This application has seen millions of IP addresses over the past decade, but something weird happened today.
For the first time, I saw an x-forwarded-for that contained a second address reach the application [addressed altered]:
x-forwarded-for: 62.211.19.218, 177.168.159.85
This indicates that the traffic came through a proxy, and I understand this is normal for x-f-f. I would have thought this was impossible (or at least unlikely) with https as the protocol.
Can someone explain how this is legit?
As per RFC 7239, this HTTP header is specified as
X-Forwarded-For: client, proxy1, proxy2, ...
Where client is the IP of the original client and then each proxy adds the IP it received the request from, at the end of the list. In the above example, you would see IP of proxy3 in your webserver and proxy2 is the IP which connected to the proxy3.
As anyone can put anything inside this header, you should accept it only from known sources like your own reverse proxy or whitelist of known legit proxies. For example Apache has mod_rpaf, which transparently changes client IP address to the one provided in this header, but only if the request is received from the IP of known proxy server.
On corporate networks you can easily do transparent proxying for HTTPS traffic without any notice from normal users. Just create your own certification authority, use for example Windows Group Policy to install & trust this CA on all corporate workstations. Then redirect all HTTPS connections to your proxy which will generate certificate for all visited domains on the fly. This is something which is happening and you can even buy enterprise hardware proxies using this method.
So to summarize the reasons why you could see multiple IPs in the X-Forwarded-For header:
Transparent HTTPS proxy as mentioned above
The header was added by the requestor itself (browser, wget, script) for whatever reason, for example to hide its own IP
Some CDN like Cloudflare could add that header if used
Multiple reverse proxies defined either intentionally or by mistake
Conclusion: You should only trust this header if it originates from your own proxy (in case of multiple IPs, trust only the last one).
MAYBE it's using the Proxy protocol for HTTPS. Granted you may not be using httproxy, but this seems to be a decent description:
http://www.haproxy.org/download/1.5/doc/proxy-protocol.txt
I'm not sure about the SSL cert, but there's no guarantee someone is doing something pathalogical (maybe unintentionally) like running all their HTTPS traffic through a proxy and then accepting all the invalid certificates. But I suspect the proxy protocol might make this work; it does expose the HTTP headers to the proxy in some sense.

Pros and cons of using a Http proxy v/s https proxy?

The JVM allows proxy properties http.proxyHost and http.proxyPort for specifying a HTTP proxy server and https.proxyHost and https.proxyPort for specifying a HTTPS proxy server .
I was wondering whether there are any advantages of using a HTTPS proxy server compared to a HTTP proxy server ?
Is accessing a https url via a HTTPS proxy less cumbersome than accesing it from a HTTP proxy ?
HTTP proxy gets a plain-text request and [in most but not all cases] sends a different HTTP request to the remote server, then returns information to the client.
HTTPS proxy is a relayer, which receives special HTTP request (CONNECT verb) and builds an opaque tunnel to the destination server (which is not necessarily even an HTTPS server). Then the client sends SSL/TLS request to the server and they continue with SSL handshake and then with HTTPS (if requested).
As you see, these are two completely different proxy types with different behavior and different design goals. HTTPS proxy can't cache anything as it doesn't see the request sent to the server. With HTTPS proxy you have a channel to the server and the client receives and validates server's certificate (and optionally vice versa). HTTP proxy, on the other hand, sees and has control over the request it received from the client.
While HTTPS request can be sent via HTTP proxy, this is almost never done because in this scenario the proxy will validate server's certificate, but the client will be able to receive and validate only proxy's certificate, and as name in the proxy's certificate will not match the address the socket connected to, in most cases an alert will be given and SSL handshake won't succeed (I am not going into details of how to try to address this).
Finally, as HTTP proxy can look into the request, this invalidates the idea of security provided by HTTPS channel, so using HTTP proxy for HTTPS requests is normally done only for debugging purposes (again we omit cases of paranoid company security policies which require monitoring of all HtTPS traffic of company employees).
Addition: also read my answer on the similar topic here.
There are no pros or cons.
And there are no "HTTPS proxy" server.
You can tell the protocol handlers which proxy server to use for different protocols. This can be done for http, https, ftp and socks. Not more and not less.
I can't tell you if you should use a different proxy for https connections or not. It depends.
I can only explain the difference of an http and https request to a proxy.
Since the HTTP Proxy (or web proxy) understands HTTP (hence the name), the client can just send the request to the proxy server instead of the actual destenation.
This does not work for HTTPS.
This is because the proxy can't make the TLS handshake, which happens at first.
Therefore the client must send a CONNECT request to the proxy.
The proxy establishes a TCP connection and just sends the packages forth and back without touching them.
So the TLS handshake happens between the client and destenation.
The HTTP proxy server does not see everything and does not validate destenation servers certificate whatsoever.
There can be some confusion with this whole http, https, proxy thing.
It is possible to connect to a HTTP proxy with https.
In this case, the communication between the client and the proxy is encrypted.
There are also so called TLS terminating or interception proxy servers like Squid's SSL Peek and Splice or burp, which see everything.
But this should not work out of the box, because the proxy uses own certificates which are not signed by trusted CAs.
References
https://docs.oracle.com/javase/8/docs/technotes/guides/net/proxies.html
https://parsiya.net/blog/2016-07-28-thick-client-proxying---part-6-how-https-proxies-work/
http://dev.chromium.org/developers/design-documents/secure-web-proxy
https://www.rfc-editor.org/rfc/rfc2817#section-5
https://www.rfc-editor.org/rfc/rfc7231#section-4.3.6
If you mean connecting to a HTTP proxy server over TLS by saying HTTPS proxy, then
I was wondering whether there are any advantages of using a HTTPS
proxy server compared to a HTTP proxy server ?
The advantage is that your client's connection to proxy server is encrypted. E.g. A firewall can't not see which host you use CONNECT method connect to.
Is accessing a https url via a HTTPS proxy less cumbersome than
accesing it from a HTTP proxy ?
Everything is the same except that with HTTPS proxy, brower to proxy server connection is encrypted.
But you need to deploy a certificate on your proxy server, like how a https website does, and use a pac file to configure the brower to enable Connecting to a proxy over SSL.
For more details and a practical example, check my question and answer here HTTPs proxy server only works in SwitchOmega
Unfortunately, "HTTPS proxy" has two distinct meanings:
A proxy that can forward HTTPS traffic to the destination. This proxy itself is using an HTTP protocol to set up the forwarding.
In case the browser is trying to connect to a website using HTTPS, the browser will send a CONNECT request to the proxy, and the proxy will set up a TCP connection with the website and mirror all TCP traffic sent on the connection from the browser to the proxy onto the connection between the proxy and the website, and similarly mirror the response TCP packet payload from the webite to the connection with the browser. Hypothetically, the same mechanism using CONNECT could be used with HTTP traffic, but practically speaking browsers don't do that. For HTTP traffic, they send the actual HTTP request to the proxy, including the full path in the HTTP command (as well as setting the Host header): https://stackoverflow.com/a/38259076/10026
So, by this definition, HTTPS Proxy is a proxy that understands the CONNECT directive and can support HTTPS traffic going between the browser and the website.
A proxy that uses HTTPS protocol to secure client communication.
In this mode (sometimes referred to as "Secure Proxy"), the browser uses the proxy's own certificate to perform TLS handshake with the proxy, and then sends either HTTP or HTTPS traffic, (including CONNECT requests), on that connection as per (1). So, the connection between the browser and the proxy is always protected with a TLS key derived using the proxy's certificate, regardless of whether the traffic itself is encrypted with a key negotiated between the browser and the website. If HTTPS traffic is proxied via a secure proxy, it is double-encrypted on the connection between the browser and the proxy.
For example, the Proxy Switcher Chrome plugin has two separate settings to control each of these funtionalities:
As of 2022, the option to use a secure proxy is not available in MacOS and Windows manual proxy configuration UI. But a secure proxy may be specified in a PAC file used in automatic proxy configuration using the HTTPS proxy directive. It is up to the consuming application to support the HTTPS directive; most major browsers, except Safari, and many desktop apps support it.
NOTE: Things get a bit more complicated because some proxies that proxy HTTPS traffic don't simply forward TCP packet payload, as described in (1), but act as Intercepting Proxies. Using a spoofed website certificate, they effectively perform a Man-in-the-Middle attack (well, it's not necessarily an attack because it's expected behavior). Whereas the browser thinks it's using the website's certificate to set up a TLS tunnel with a website, it's actually using a spoofed certificate to set up TLS tunnel with the proxy, and the proxy sets up the TLS tunnel with the website. Then proxy has visibility into the HTTPS requests/responses. But all of that is completely orthogonal to whether the proxy is acting as a secure proxy as per (2).

Resources