Why would a webserver need to rely on "Host" header? - http

I'm trying to understand vulnerabilities arising from HTTP(S) header Host. I heard that webservers may use the value of the Host header from incoming requests to do different stuff such as constructing URLs. For example here is a excerpt from Django documentation:
Django uses the Host header provided by the client to construct URLs in certain cases.
I know that any information in HTTP(S) requests may not be trusted. Web servers know what host name they are behind. So why would they take it from Host header that cannot be trusted if they can have their host name configured manually?

Related

Faking an HTTP request header

I have a general networking question but it's related with security aspect.
Here is my case: I have a host which is infected by a malware. The malware creates an http packet to communicate with it's command and control server. While constructing the packet, the IP layer contains the correct IP address of the command and control server. The tcp layer contains the correct port number 80.
Before sending the packet out, the malware modifies the http header to replace the host header with “google.com" instead of it's server address. It then attaches the stolen data with the packet and sends it out.
My understanding is that the packet will get delivered to the correct server because the routing will happen based on the IP.
But can I host a webserver on this IP that would receive all packets with header host google.com and parse it correctly?
Based on my reading on the internet, it is possible but if it is that easy then why have malware authors not adopted this technique to spoof the http headers and bypass traditional domain whitelisting engines.
When you make a request to let's say Apache2 server, what actually Apache does is match your "Host" header with any VirtualHost within server's configuration. Only if it cannot be found / is invalid, Apache will route the request to default virtualhost if it's defined. Basically nothing stops you from changing these headers.
You can simply test it by editing your hosts file and pointing google.com to any other IP - you will be able to handle the google.com domain on your server, but only you will be to use it this way - no one else.
Anything you send inside HTTP headers shouldn't be trusted - it just a guide for your server on how to actually handle the traffic.
The fake host header is just there to trick some deep-inspection firewalls ("it's for Google? you may pass..."). The server on that IP either doesn't care about the host header (default vhost) or is explicitly configured to accept it.
Passing the loot on by using fake headers or just as plain data behind the headers is another trick to fool data loss prevention.
These methods can mislead shallow application-layer inspection but won't pass a decent firewall.

How to Eliminate Nginx from the Production Stack via using Cloudflare as a substitutable Reverse Proxy?

Is it possible to use "cloudflare" as a reverse proxy for hosting several websites on the same host machine but on different ports?
Cloudflare can replace some of the features of Nginx, specifically:
Caching resources
Rate limiting and protecting your website
Redirecting access to your website to another server
But you still need Nginx or another web server for the following tasks:
Handling the TCP connections between Cloudflare and the server which generates the response (+ HTTPS should be used)
Generating the actual response, via FastCGI (PHP, Python, Ruby, etc.) or just delivering a file/resource (server and location blocks in Nginx)
Setting the correct headers for the response, for caching and content type (Cloudflare relies on these)
Cloudflare does not support sending your requests to specific ports on the origin host - but that would still not help you much, because Cloudflare has a very specific feature set, and generating responses is not part of them, which is why you need a web server.
If you want to reduce the work needed to maintain Nginx, you can restrict Nginx to only reply to requests by Cloudflare and do the rate limiting and some other tasks in Cloudflare.

Should x-forwarded-for contain a proxy in https traffic?

I have a web server cluster behind a proxy/load balancer. That proxy contains my SSL certs and hands the web servers the decrypted traffic, and along the way adds an "x-forwarded-for" header into the HTTP header the web application receives. This application has seen millions of IP addresses over the past decade, but something weird happened today.
For the first time, I saw an x-forwarded-for that contained a second address reach the application [addressed altered]:
x-forwarded-for: 62.211.19.218, 177.168.159.85
This indicates that the traffic came through a proxy, and I understand this is normal for x-f-f. I would have thought this was impossible (or at least unlikely) with https as the protocol.
Can someone explain how this is legit?
As per RFC 7239, this HTTP header is specified as
X-Forwarded-For: client, proxy1, proxy2, ...
Where client is the IP of the original client and then each proxy adds the IP it received the request from, at the end of the list. In the above example, you would see IP of proxy3 in your webserver and proxy2 is the IP which connected to the proxy3.
As anyone can put anything inside this header, you should accept it only from known sources like your own reverse proxy or whitelist of known legit proxies. For example Apache has mod_rpaf, which transparently changes client IP address to the one provided in this header, but only if the request is received from the IP of known proxy server.
On corporate networks you can easily do transparent proxying for HTTPS traffic without any notice from normal users. Just create your own certification authority, use for example Windows Group Policy to install & trust this CA on all corporate workstations. Then redirect all HTTPS connections to your proxy which will generate certificate for all visited domains on the fly. This is something which is happening and you can even buy enterprise hardware proxies using this method.
So to summarize the reasons why you could see multiple IPs in the X-Forwarded-For header:
Transparent HTTPS proxy as mentioned above
The header was added by the requestor itself (browser, wget, script) for whatever reason, for example to hide its own IP
Some CDN like Cloudflare could add that header if used
Multiple reverse proxies defined either intentionally or by mistake
Conclusion: You should only trust this header if it originates from your own proxy (in case of multiple IPs, trust only the last one).
MAYBE it's using the Proxy protocol for HTTPS. Granted you may not be using httproxy, but this seems to be a decent description:
http://www.haproxy.org/download/1.5/doc/proxy-protocol.txt
I'm not sure about the SSL cert, but there's no guarantee someone is doing something pathalogical (maybe unintentionally) like running all their HTTPS traffic through a proxy and then accepting all the invalid certificates. But I suspect the proxy protocol might make this work; it does expose the HTTP headers to the proxy in some sense.

Real life usage of the X-Forwarded-Host header?

I've found some interesting reading on the X-Forwarded-* headers, including the Reverse Proxy Request Headers section in the Apache documentation, as well as the Wikipedia article on X-Forwarded-For.
I understand that:
X-Forwarded-For gives the address of the client which connected to the proxy
X-Forwarded-Port gives the port the client connected to on the proxy (e.g. 80 or 443)
X-Forwarded-Proto gives the protocol the client used to connect to the proxy (http or https)
X-Forwarded-Host gives the content of the Host header the client sent to the proxy.
These all make sense.
However, I still can't figure out a real life use case of X-Forwarded-Host. I understand the need to repeat the connection on a different port or using a different scheme, but why would a proxy server ever change the Host header when repeating the request to the target server?
If you use a front-end service like Apigee as the front-end to your APIs, you will need something like X-FORWARDED-HOST to understand what hostname was used to connect to the API, because Apigee gets configured with whatever your backend DNS is, nginx and your app stack only see the Host header as your backend DNS name, not the hostname that was called in the first place.
This is the scenario I worked on today:
Users access certain application server using "https://neaturl.company.com" URL which is pointing to Reverse Proxy. Proxy then terminates SSL and redirects users' requests to the actual application server which has URL of "http://192.168.1.1:5555". The problem is - when application server needed to redirect user to other page on the same server using absolute path, it was using latter URL and users don't have access to this. Using X-Forwarded-Host (+ X-Forwarded-Proto and X-Forwarded-Port) allowed our proxy to tell application server which URL user used originally and thus server started to generate correct absolute path in its responses.
In this case there was no option to stop application server to generate absolute URLs nor configure it for "public url" manually.
I can tell you a real life issue, I had an issue using an IBM portal.
In my case the problem was that the IBM portal has a rest service which retrieves an url for a resource, something like:
{"url":"http://internal.host.name/path"}
What happened?
Simple, when you enter from intranet everything works fine because internalHostName exists but... when the user enter from internet then the proxy is not able to resolve the host name and the portal crashes.
The fix for the IBM portal was to read the X-FORWARDED-HOST header and then change the response to something like:
{"url":"http://internet.host.name/path"}
See that I put internet and not internal in the second response.
For the need for 'x-forwarded-host', I can think of a virtual hosting scenario where there are several internal hosts (internal network) and a reverse proxy sitting in between those hosts and the internet. If the requested host is part of the internal network, the requested host resolves to the reverse proxy IP and the web browser sends the request to the reverse proxy. This reverse proxy finds the appropriate internal host and forwards the request sent by the client to this host. In doing so, the reverse proxy changes the host field to match the internal host and sets the x-forward-host to the actual host requested by the client. More details on reverse proxy can be found in this wikipedia page http://en.wikipedia.org/wiki/Reverse_proxy.
Check this post for details on x-forwarded-for header and a simple demo python script that shows how a web-server can detect the use of a proxy server: x-forwarded-for explained
One example could be a proxy that blocks certain hosts and redirects them to an external block page. In fact, I’m almost certain my school filter does this…
(And the reason they might not just pass on the original Host as Host is because some servers [Nginx?] reject any traffic to the wrong Host.)
X-Forwarded-Host just saved my life. CDNs (or reverse proxy if you'd like to go down to "trees") determine which origin to use by Host header a user comes to them with. Thus, a CDN can't use the same Host header to contact the origin - otherwise, the CDN would go to itself in a loop rather than going to the origin. Thus, the CDN uses either IP address or some dummy FQDN as the Host header fetching content from the origin. Now, the origin may wish to know what was the Host header (aka website name) the content is asked for. In my case, one origin served 2 websites.
Another scenario, you license your app to a host URL then you want to load balance across n > 1 servers.

Where is origin of HttpContext.Current.Request.Url.Host?

Why does HttpContext.Current.Request.Url.Host return a different URL than the URL used in the Web browser? For example, when entering "www.someurl.com" in the browser, the HttpContext.Current.Request.Url.Host variable is equal to "www.someotherurl.com".
HttpContext.Current.Request.Url.Host is the contents of the Host header that the ASP.net application receives. (see http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html for more info about HTTP headers like Host).
Usually the header that ASP.NET sees is identical to the Host header sent by the browser. However, it's possible they won't match if software or hardware is sitting in between the browser and your ASP.net code and is rewriting the Host header.
For example, large-scale budget hosters like GoDaddy do this so they can support multiple top-level domains on a single IIS website, even on their cheaper hosting plans. Instead of creating a separate IIS website (which adds to server load and hence cost), GoDaddy remaps requests for http://secondsite.com/ to a virtual directory on your "main" hosted site, e.g. http://firstsite.com/secondsite). They will change both the Host: header as well as the URL.
BTW, you can easily verify that this is what's happening by dumping the contents of HTTP Request Headers that your app is receiving.
Anyway, if you want to figure out who is changing the Host header, start with the people who host your web app (or the team which is responsible for your load balancer and/or reverse proxy), since they're likely the ones responsible for rewriting your Host header.

Resources