Should x-forwarded-for contain a proxy in https traffic? - http

I have a web server cluster behind a proxy/load balancer. That proxy contains my SSL certs and hands the web servers the decrypted traffic, and along the way adds an "x-forwarded-for" header into the HTTP header the web application receives. This application has seen millions of IP addresses over the past decade, but something weird happened today.
For the first time, I saw an x-forwarded-for that contained a second address reach the application [addressed altered]:
x-forwarded-for: 62.211.19.218, 177.168.159.85
This indicates that the traffic came through a proxy, and I understand this is normal for x-f-f. I would have thought this was impossible (or at least unlikely) with https as the protocol.
Can someone explain how this is legit?

As per RFC 7239, this HTTP header is specified as
X-Forwarded-For: client, proxy1, proxy2, ...
Where client is the IP of the original client and then each proxy adds the IP it received the request from, at the end of the list. In the above example, you would see IP of proxy3 in your webserver and proxy2 is the IP which connected to the proxy3.
As anyone can put anything inside this header, you should accept it only from known sources like your own reverse proxy or whitelist of known legit proxies. For example Apache has mod_rpaf, which transparently changes client IP address to the one provided in this header, but only if the request is received from the IP of known proxy server.
On corporate networks you can easily do transparent proxying for HTTPS traffic without any notice from normal users. Just create your own certification authority, use for example Windows Group Policy to install & trust this CA on all corporate workstations. Then redirect all HTTPS connections to your proxy which will generate certificate for all visited domains on the fly. This is something which is happening and you can even buy enterprise hardware proxies using this method.
So to summarize the reasons why you could see multiple IPs in the X-Forwarded-For header:
Transparent HTTPS proxy as mentioned above
The header was added by the requestor itself (browser, wget, script) for whatever reason, for example to hide its own IP
Some CDN like Cloudflare could add that header if used
Multiple reverse proxies defined either intentionally or by mistake
Conclusion: You should only trust this header if it originates from your own proxy (in case of multiple IPs, trust only the last one).

MAYBE it's using the Proxy protocol for HTTPS. Granted you may not be using httproxy, but this seems to be a decent description:
http://www.haproxy.org/download/1.5/doc/proxy-protocol.txt
I'm not sure about the SSL cert, but there's no guarantee someone is doing something pathalogical (maybe unintentionally) like running all their HTTPS traffic through a proxy and then accepting all the invalid certificates. But I suspect the proxy protocol might make this work; it does expose the HTTP headers to the proxy in some sense.

Related

Does internal communication between private servers use DNS and HTTPS?

I would like to know how internal communication links between private internal servers and a reverse proxy look.
When from my client (browser) I make a request to, say https://facebook.com, I hit Facebook's reverse proxy. I have two questions, when that reverse proxy gets a request and needs to forward it to the server that should handle it, does that sever it is forwarding the request to have a domain name or is it just an IP address ((user.facebook.com or useroffacebook.com v.s. 34.23.66.25 (DO NOT GO TO THAT ADDRESS I JUST MADE IT UP!!!)))? Also, does that connection use HTTP or HTTPS?
Like Kshitij Joshi already mentioned, it could be both.
A more detailed perspective for implementation:
reverse proxy should use IP addresses for routing so they are still working even if the DNS fails or is unavailable to the proxy for some reason.
internal traffic should also be encrypted (HTTPS). using plain text, even in internal networks, must be considered dangerous and is not recommended.
from my mindset you can replace the 'should' with a 'must'.

Why is routing a request based on its Host header a good way to proxy traffic?

In the proxy documentation for Kong, it is mentioned that
Routing a request based on its Host header is the most straightforward
way to proxy traffic through Kong, especially since this is the
intended usage of the HTTP Host header
However, for this to work, any incoming request from a client must now have its Host header set to a particular value. In general, HTTP clients don't intentionally modify this value, so how is this used in practice?
In other words, clients aren't in general modifying the HTTP host header in their request, as is done in the curl examples in the docs, e.g.:
curl --url http://proxy.mydomain.com:8000/ --header 'Host: service.example.com'
Given that the proxy is intended to be transparent to clients, why is it the case that 'this is the intended usage of the HTTP Host header'?
If the proxy is transparent to the client, the client usually doesn't know that a proxy is used and therefore resolves the IP Address via DNS. The client then establishes a TCP connection to the IP address accordingly.
The (transparent) proxy now intercepts the traffic. The Host header is now the only chance to get the servers FQDN. This is important if connection is HTTPS so proxy can use the Host header value as SNI / Verify the Server's certificate.
Independent of the use of a transparent proxy, the host header should contain the server name which allows hosting multiple webpages using the same server HW.
Example:
Server IP 1.2.3.4 with 4 websites: www.a.com, www.b.com, www.c.com, www.d.com.
The client must provide the value of the website in the host header in order to allow the server to distinguish between the different websites.

Faking an HTTP request header

I have a general networking question but it's related with security aspect.
Here is my case: I have a host which is infected by a malware. The malware creates an http packet to communicate with it's command and control server. While constructing the packet, the IP layer contains the correct IP address of the command and control server. The tcp layer contains the correct port number 80.
Before sending the packet out, the malware modifies the http header to replace the host header with “google.com" instead of it's server address. It then attaches the stolen data with the packet and sends it out.
My understanding is that the packet will get delivered to the correct server because the routing will happen based on the IP.
But can I host a webserver on this IP that would receive all packets with header host google.com and parse it correctly?
Based on my reading on the internet, it is possible but if it is that easy then why have malware authors not adopted this technique to spoof the http headers and bypass traditional domain whitelisting engines.
When you make a request to let's say Apache2 server, what actually Apache does is match your "Host" header with any VirtualHost within server's configuration. Only if it cannot be found / is invalid, Apache will route the request to default virtualhost if it's defined. Basically nothing stops you from changing these headers.
You can simply test it by editing your hosts file and pointing google.com to any other IP - you will be able to handle the google.com domain on your server, but only you will be to use it this way - no one else.
Anything you send inside HTTP headers shouldn't be trusted - it just a guide for your server on how to actually handle the traffic.
The fake host header is just there to trick some deep-inspection firewalls ("it's for Google? you may pass..."). The server on that IP either doesn't care about the host header (default vhost) or is explicitly configured to accept it.
Passing the loot on by using fake headers or just as plain data behind the headers is another trick to fool data loss prevention.
These methods can mislead shallow application-layer inspection but won't pass a decent firewall.

Do HTTPS connections require HTTPS proxies or can I use HTTP proxies?

The question is about HTTP vs HTTPS.
If I want to anonymously load a website that forces HTTPS, like Google.com, do I need an HTTPS proxies, or can I get away with HTTP proxies?
If your proxy is SOCKS it will not care what kind of socket is connecting through it. It has its own handshake and it does not care about what happens after the handshake. Whether after the SOCKS handshake an SSL handshake (HTTPS) is started it is not a SOCKS proxy problem, it will just pass through.
Several HTTP proxies on the other hand expect HTTP headers to guide them, such a HTTP proxy will not allow HTTPS since it needs to read the headers.
On the third hand (ekhm... well, foot?), an HTTP proxy that supports HTTP CONNECT can also setup the transfer of arbitrary data. Therefore such a proxy can setup any type of socket, which can have an SSL handshake, which can then be used for HTTPS transfer.
HTTP Proxy Server supports CONNECT verb which supports HTTPS connections within HTTP Proxy. You don't need special HTTPS proxy server or any other setup.
CONNECT verb allows you to create binary socket tunnel to any given IP:Port address. So any HTTP client (all browsers), will open secure tunnel and communicate securely over proxy server. However, no one cant control or see anything that is going through the tunnel unless they implement man in middle attack by sending you self-signed certificates.
Most firewall these days automatically implement man in middle self signed certificates that are deployed in work network, so you have to probably dig more to identify whether it is really secure or not. So it may not be that anonymous.
If you're trying to access a service anonymously, you won't get this by running your own proxy. It's not clear from the original question what is meant by "proxy", e.g. local service, or remote service. You won't get anonymity by surfing through a proxy that's on your network, unless it's something like a TOR proxy which relays out through the TOR network.
As for whether proxies can support HTTPS or not, that's been covered here, it would be unusual to find a proxy that doesn't support CONNECT. However if it's a remote anonymizing service you're using, I doubt they would do MitM, since you'd need to install the signing cert into your trusted root store, so they couldn't do that surreptitiously.

Reliably getting a web client IP

What is the most reliable way of obtaining the IP address of a remote client connecting to your website? Some options I've looked into are:
Server variables (such as REMOTE_ADDR in Apache), though this is usually the proxy address.
A Java applet, but IE (at least the one I'm using) seems to deny it.
The only other thing I'm thinking about is having the client connect over HTTPS, in which case the proxy should be bypassed (generally speaking), and so REMOTE_ADDR would be accurate.
Any ideas?
Anything client-side (javascript, java) will give you the PCs IP address. Which could be an internal IP address like 10.0.0.1.
Re: SSL + REMOTE_ADDR, most workplace proxies send all the SSL through an application level proxy, SOME just allow 443 outbound. Any thing coming thru a proxy will still give you the proxy address, as the proxy is still the computer making the connection to your webserver.
HTTPS through a proxy is still a possibility, if the proxy is non-transparent (say, with a client on a corporate network). With HTTPS through a proxy, the REMOTE_ADDR will still be the proxy address - the proxy is still in the path, it just only gets to see the encrypted traffic.
If the client is going through a proxy, you'll have to rely on the proxy telling you their IP. The X-Forwarded-For header will contain this, but you can only really rely on this if you trust the proxy. If this is for logging purposes, log both REMOTE_ADDR and X-Forwarded-For. If it's for something else, you'll need to maintain a whitelist of proxies (as determined by REMOTE_ADDR) that you'll accept X-Forwarded-For from.

Resources