Reliably getting a web client IP - networking

What is the most reliable way of obtaining the IP address of a remote client connecting to your website? Some options I've looked into are:
Server variables (such as REMOTE_ADDR in Apache), though this is usually the proxy address.
A Java applet, but IE (at least the one I'm using) seems to deny it.
The only other thing I'm thinking about is having the client connect over HTTPS, in which case the proxy should be bypassed (generally speaking), and so REMOTE_ADDR would be accurate.
Any ideas?

Anything client-side (javascript, java) will give you the PCs IP address. Which could be an internal IP address like 10.0.0.1.
Re: SSL + REMOTE_ADDR, most workplace proxies send all the SSL through an application level proxy, SOME just allow 443 outbound. Any thing coming thru a proxy will still give you the proxy address, as the proxy is still the computer making the connection to your webserver.

HTTPS through a proxy is still a possibility, if the proxy is non-transparent (say, with a client on a corporate network). With HTTPS through a proxy, the REMOTE_ADDR will still be the proxy address - the proxy is still in the path, it just only gets to see the encrypted traffic.
If the client is going through a proxy, you'll have to rely on the proxy telling you their IP. The X-Forwarded-For header will contain this, but you can only really rely on this if you trust the proxy. If this is for logging purposes, log both REMOTE_ADDR and X-Forwarded-For. If it's for something else, you'll need to maintain a whitelist of proxies (as determined by REMOTE_ADDR) that you'll accept X-Forwarded-For from.

Related

Does internal communication between private servers use DNS and HTTPS?

I would like to know how internal communication links between private internal servers and a reverse proxy look.
When from my client (browser) I make a request to, say https://facebook.com, I hit Facebook's reverse proxy. I have two questions, when that reverse proxy gets a request and needs to forward it to the server that should handle it, does that sever it is forwarding the request to have a domain name or is it just an IP address ((user.facebook.com or useroffacebook.com v.s. 34.23.66.25 (DO NOT GO TO THAT ADDRESS I JUST MADE IT UP!!!)))? Also, does that connection use HTTP or HTTPS?
Like Kshitij Joshi already mentioned, it could be both.
A more detailed perspective for implementation:
reverse proxy should use IP addresses for routing so they are still working even if the DNS fails or is unavailable to the proxy for some reason.
internal traffic should also be encrypted (HTTPS). using plain text, even in internal networks, must be considered dangerous and is not recommended.
from my mindset you can replace the 'should' with a 'must'.

How does a HTTP Reverse Proxy work

I saw a website that offers HTTP Reverse proxy and TCP Reverse Proxy, I know what TCP Reverse Proxy is, but I have no clue what the difference is between HTTP & TCP Reverse proxy.
Try to explain it to someone that would like to buy it, because i want to know how it works. Those website's that offers it usually says "1 Protected Domain".
Contrary to a simple TCP reverse proxy a HTTP proxy can choose the target HTTP server based on the content of the HTTP request, i.e. based on the target host name (Host header), the path, Cookies, the User-Agent etc. It can also include additional headers like X-Forwarded-For to include the clients original IP address.

Do HTTPS connections require HTTPS proxies or can I use HTTP proxies?

The question is about HTTP vs HTTPS.
If I want to anonymously load a website that forces HTTPS, like Google.com, do I need an HTTPS proxies, or can I get away with HTTP proxies?
If your proxy is SOCKS it will not care what kind of socket is connecting through it. It has its own handshake and it does not care about what happens after the handshake. Whether after the SOCKS handshake an SSL handshake (HTTPS) is started it is not a SOCKS proxy problem, it will just pass through.
Several HTTP proxies on the other hand expect HTTP headers to guide them, such a HTTP proxy will not allow HTTPS since it needs to read the headers.
On the third hand (ekhm... well, foot?), an HTTP proxy that supports HTTP CONNECT can also setup the transfer of arbitrary data. Therefore such a proxy can setup any type of socket, which can have an SSL handshake, which can then be used for HTTPS transfer.
HTTP Proxy Server supports CONNECT verb which supports HTTPS connections within HTTP Proxy. You don't need special HTTPS proxy server or any other setup.
CONNECT verb allows you to create binary socket tunnel to any given IP:Port address. So any HTTP client (all browsers), will open secure tunnel and communicate securely over proxy server. However, no one cant control or see anything that is going through the tunnel unless they implement man in middle attack by sending you self-signed certificates.
Most firewall these days automatically implement man in middle self signed certificates that are deployed in work network, so you have to probably dig more to identify whether it is really secure or not. So it may not be that anonymous.
If you're trying to access a service anonymously, you won't get this by running your own proxy. It's not clear from the original question what is meant by "proxy", e.g. local service, or remote service. You won't get anonymity by surfing through a proxy that's on your network, unless it's something like a TOR proxy which relays out through the TOR network.
As for whether proxies can support HTTPS or not, that's been covered here, it would be unusual to find a proxy that doesn't support CONNECT. However if it's a remote anonymizing service you're using, I doubt they would do MitM, since you'd need to install the signing cert into your trusted root store, so they couldn't do that surreptitiously.

Should x-forwarded-for contain a proxy in https traffic?

I have a web server cluster behind a proxy/load balancer. That proxy contains my SSL certs and hands the web servers the decrypted traffic, and along the way adds an "x-forwarded-for" header into the HTTP header the web application receives. This application has seen millions of IP addresses over the past decade, but something weird happened today.
For the first time, I saw an x-forwarded-for that contained a second address reach the application [addressed altered]:
x-forwarded-for: 62.211.19.218, 177.168.159.85
This indicates that the traffic came through a proxy, and I understand this is normal for x-f-f. I would have thought this was impossible (or at least unlikely) with https as the protocol.
Can someone explain how this is legit?
As per RFC 7239, this HTTP header is specified as
X-Forwarded-For: client, proxy1, proxy2, ...
Where client is the IP of the original client and then each proxy adds the IP it received the request from, at the end of the list. In the above example, you would see IP of proxy3 in your webserver and proxy2 is the IP which connected to the proxy3.
As anyone can put anything inside this header, you should accept it only from known sources like your own reverse proxy or whitelist of known legit proxies. For example Apache has mod_rpaf, which transparently changes client IP address to the one provided in this header, but only if the request is received from the IP of known proxy server.
On corporate networks you can easily do transparent proxying for HTTPS traffic without any notice from normal users. Just create your own certification authority, use for example Windows Group Policy to install & trust this CA on all corporate workstations. Then redirect all HTTPS connections to your proxy which will generate certificate for all visited domains on the fly. This is something which is happening and you can even buy enterprise hardware proxies using this method.
So to summarize the reasons why you could see multiple IPs in the X-Forwarded-For header:
Transparent HTTPS proxy as mentioned above
The header was added by the requestor itself (browser, wget, script) for whatever reason, for example to hide its own IP
Some CDN like Cloudflare could add that header if used
Multiple reverse proxies defined either intentionally or by mistake
Conclusion: You should only trust this header if it originates from your own proxy (in case of multiple IPs, trust only the last one).
MAYBE it's using the Proxy protocol for HTTPS. Granted you may not be using httproxy, but this seems to be a decent description:
http://www.haproxy.org/download/1.5/doc/proxy-protocol.txt
I'm not sure about the SSL cert, but there's no guarantee someone is doing something pathalogical (maybe unintentionally) like running all their HTTPS traffic through a proxy and then accepting all the invalid certificates. But I suspect the proxy protocol might make this work; it does expose the HTTP headers to the proxy in some sense.

Real life usage of the X-Forwarded-Host header?

I've found some interesting reading on the X-Forwarded-* headers, including the Reverse Proxy Request Headers section in the Apache documentation, as well as the Wikipedia article on X-Forwarded-For.
I understand that:
X-Forwarded-For gives the address of the client which connected to the proxy
X-Forwarded-Port gives the port the client connected to on the proxy (e.g. 80 or 443)
X-Forwarded-Proto gives the protocol the client used to connect to the proxy (http or https)
X-Forwarded-Host gives the content of the Host header the client sent to the proxy.
These all make sense.
However, I still can't figure out a real life use case of X-Forwarded-Host. I understand the need to repeat the connection on a different port or using a different scheme, but why would a proxy server ever change the Host header when repeating the request to the target server?
If you use a front-end service like Apigee as the front-end to your APIs, you will need something like X-FORWARDED-HOST to understand what hostname was used to connect to the API, because Apigee gets configured with whatever your backend DNS is, nginx and your app stack only see the Host header as your backend DNS name, not the hostname that was called in the first place.
This is the scenario I worked on today:
Users access certain application server using "https://neaturl.company.com" URL which is pointing to Reverse Proxy. Proxy then terminates SSL and redirects users' requests to the actual application server which has URL of "http://192.168.1.1:5555". The problem is - when application server needed to redirect user to other page on the same server using absolute path, it was using latter URL and users don't have access to this. Using X-Forwarded-Host (+ X-Forwarded-Proto and X-Forwarded-Port) allowed our proxy to tell application server which URL user used originally and thus server started to generate correct absolute path in its responses.
In this case there was no option to stop application server to generate absolute URLs nor configure it for "public url" manually.
I can tell you a real life issue, I had an issue using an IBM portal.
In my case the problem was that the IBM portal has a rest service which retrieves an url for a resource, something like:
{"url":"http://internal.host.name/path"}
What happened?
Simple, when you enter from intranet everything works fine because internalHostName exists but... when the user enter from internet then the proxy is not able to resolve the host name and the portal crashes.
The fix for the IBM portal was to read the X-FORWARDED-HOST header and then change the response to something like:
{"url":"http://internet.host.name/path"}
See that I put internet and not internal in the second response.
For the need for 'x-forwarded-host', I can think of a virtual hosting scenario where there are several internal hosts (internal network) and a reverse proxy sitting in between those hosts and the internet. If the requested host is part of the internal network, the requested host resolves to the reverse proxy IP and the web browser sends the request to the reverse proxy. This reverse proxy finds the appropriate internal host and forwards the request sent by the client to this host. In doing so, the reverse proxy changes the host field to match the internal host and sets the x-forward-host to the actual host requested by the client. More details on reverse proxy can be found in this wikipedia page http://en.wikipedia.org/wiki/Reverse_proxy.
Check this post for details on x-forwarded-for header and a simple demo python script that shows how a web-server can detect the use of a proxy server: x-forwarded-for explained
One example could be a proxy that blocks certain hosts and redirects them to an external block page. In fact, I’m almost certain my school filter does this…
(And the reason they might not just pass on the original Host as Host is because some servers [Nginx?] reject any traffic to the wrong Host.)
X-Forwarded-Host just saved my life. CDNs (or reverse proxy if you'd like to go down to "trees") determine which origin to use by Host header a user comes to them with. Thus, a CDN can't use the same Host header to contact the origin - otherwise, the CDN would go to itself in a loop rather than going to the origin. Thus, the CDN uses either IP address or some dummy FQDN as the Host header fetching content from the origin. Now, the origin may wish to know what was the Host header (aka website name) the content is asked for. In my case, one origin served 2 websites.
Another scenario, you license your app to a host URL then you want to load balance across n > 1 servers.

Resources