How does a HTTP Reverse Proxy work - http

I saw a website that offers HTTP Reverse proxy and TCP Reverse Proxy, I know what TCP Reverse Proxy is, but I have no clue what the difference is between HTTP & TCP Reverse proxy.
Try to explain it to someone that would like to buy it, because i want to know how it works. Those website's that offers it usually says "1 Protected Domain".

Contrary to a simple TCP reverse proxy a HTTP proxy can choose the target HTTP server based on the content of the HTTP request, i.e. based on the target host name (Host header), the path, Cookies, the User-Agent etc. It can also include additional headers like X-Forwarded-For to include the clients original IP address.

Related

Why can HTTP/HTTPS proxy and Socks proxy work on one single port?

Many proxy soft provides multiple protocol over one port.
1.
Is there any byte (above or in TCP/UDP package) reserved to mark which protocol client is using?
As I know, HTTP protocol is just carried by TCP's data segment, and there is no any other mark.
So how proxy software tell the protocol when received a request?
(By guessing the first one or two bytes received? It doesn't sounds a good idea)
2.
What's the difference between HTTP proxy and HTTPS proxy?
Here are my guesses
"HTTP proxy" only means a service that can provide proxy for HTTP protocol, And a "HTTPS proxy" can provides service for HTTPS protocol? (means the only difference is whether they can deal HTTP CONNECT method) So the HTTPS proxy is just a functionally enhanced HTTP proxy
Or
HTTPS proxy offers extra security layer between client and proxy server? (to protect the HTTP CONNECT method headers?) So the communication process is quite different between HTTP proxy and HTTPS
proxy, and both HTTP proxy and HTTPS proxy can service HTTP/HTTPS protocol ?
Is there any byte (above or in TCP/UDP package) reserved to mark which protocol client is using?
It is not a single byte, but a HTTP request is clearly distinguishable from a SOCKS request, as a look at the respective standards (RFC 7230, RFC 1928) would show. It isn't the case that all protocols can be easily distinguished, but it is true for SOCKS and HTTP.
What's the difference between HTTP proxy and HTTPS proxy?
A HTTPS proxy is a HTTP proxy used for https:// requests. This is done by using the CONNECT method to create a tunnel through the proxy to the final server and then doing end-to-end HTTPS (TLS+HTTP) inside this tunnel.
... Or HTTPS proxy offers extra security layer between client and proxy server?
This exists too, but it is usually called "HTTP proxy over TLS", "HTTP proxy over HTTPS", "encrypted proxy connection" or similar, but not "HTTPS proxy".

In a reverse proxy server + Python HTTPS Server, who should handle SSL Certificates for HTTPS connections?

Suppose I want to use a combination of NGinX (probably another since it doesn't proxy HTTP/2 requests) and Hypercorn. As both can handle SSL certificate files, I wonder who is the best suited to do this for an HTTPS request. It is important to me that Hypercorn could listen to 443 port and I'm not sure it can do that without specifying certfile and keyfile parameters.
Well, that depend what you want to do.
The simpliest solution is to configure both to use SSL.
Nginx will receive the request, decipher it, process it, send it to Hypercom on port 443 as an HTTPS Client. Hypercom will get the request as any normal HTTPS client.
If your goal is security : go with both
If your goal is just to not
have hypercom expose directly, you can configure it to not use SSL
Nginx support by default proxying request to an HTTPS upstream so that's the best solution I think. However, you might need to play with setting http-header for hypercom to correctly understand who's the client by playing with X-Forwarded-For, X-Forwarded-Host and any headers that might be needed by Hypercom.

Why is routing a request based on its Host header a good way to proxy traffic?

In the proxy documentation for Kong, it is mentioned that
Routing a request based on its Host header is the most straightforward
way to proxy traffic through Kong, especially since this is the
intended usage of the HTTP Host header
However, for this to work, any incoming request from a client must now have its Host header set to a particular value. In general, HTTP clients don't intentionally modify this value, so how is this used in practice?
In other words, clients aren't in general modifying the HTTP host header in their request, as is done in the curl examples in the docs, e.g.:
curl --url http://proxy.mydomain.com:8000/ --header 'Host: service.example.com'
Given that the proxy is intended to be transparent to clients, why is it the case that 'this is the intended usage of the HTTP Host header'?
If the proxy is transparent to the client, the client usually doesn't know that a proxy is used and therefore resolves the IP Address via DNS. The client then establishes a TCP connection to the IP address accordingly.
The (transparent) proxy now intercepts the traffic. The Host header is now the only chance to get the servers FQDN. This is important if connection is HTTPS so proxy can use the Host header value as SNI / Verify the Server's certificate.
Independent of the use of a transparent proxy, the host header should contain the server name which allows hosting multiple webpages using the same server HW.
Example:
Server IP 1.2.3.4 with 4 websites: www.a.com, www.b.com, www.c.com, www.d.com.
The client must provide the value of the website in the host header in order to allow the server to distinguish between the different websites.

Should x-forwarded-for contain a proxy in https traffic?

I have a web server cluster behind a proxy/load balancer. That proxy contains my SSL certs and hands the web servers the decrypted traffic, and along the way adds an "x-forwarded-for" header into the HTTP header the web application receives. This application has seen millions of IP addresses over the past decade, but something weird happened today.
For the first time, I saw an x-forwarded-for that contained a second address reach the application [addressed altered]:
x-forwarded-for: 62.211.19.218, 177.168.159.85
This indicates that the traffic came through a proxy, and I understand this is normal for x-f-f. I would have thought this was impossible (or at least unlikely) with https as the protocol.
Can someone explain how this is legit?
As per RFC 7239, this HTTP header is specified as
X-Forwarded-For: client, proxy1, proxy2, ...
Where client is the IP of the original client and then each proxy adds the IP it received the request from, at the end of the list. In the above example, you would see IP of proxy3 in your webserver and proxy2 is the IP which connected to the proxy3.
As anyone can put anything inside this header, you should accept it only from known sources like your own reverse proxy or whitelist of known legit proxies. For example Apache has mod_rpaf, which transparently changes client IP address to the one provided in this header, but only if the request is received from the IP of known proxy server.
On corporate networks you can easily do transparent proxying for HTTPS traffic without any notice from normal users. Just create your own certification authority, use for example Windows Group Policy to install & trust this CA on all corporate workstations. Then redirect all HTTPS connections to your proxy which will generate certificate for all visited domains on the fly. This is something which is happening and you can even buy enterprise hardware proxies using this method.
So to summarize the reasons why you could see multiple IPs in the X-Forwarded-For header:
Transparent HTTPS proxy as mentioned above
The header was added by the requestor itself (browser, wget, script) for whatever reason, for example to hide its own IP
Some CDN like Cloudflare could add that header if used
Multiple reverse proxies defined either intentionally or by mistake
Conclusion: You should only trust this header if it originates from your own proxy (in case of multiple IPs, trust only the last one).
MAYBE it's using the Proxy protocol for HTTPS. Granted you may not be using httproxy, but this seems to be a decent description:
http://www.haproxy.org/download/1.5/doc/proxy-protocol.txt
I'm not sure about the SSL cert, but there's no guarantee someone is doing something pathalogical (maybe unintentionally) like running all their HTTPS traffic through a proxy and then accepting all the invalid certificates. But I suspect the proxy protocol might make this work; it does expose the HTTP headers to the proxy in some sense.

Reliably getting a web client IP

What is the most reliable way of obtaining the IP address of a remote client connecting to your website? Some options I've looked into are:
Server variables (such as REMOTE_ADDR in Apache), though this is usually the proxy address.
A Java applet, but IE (at least the one I'm using) seems to deny it.
The only other thing I'm thinking about is having the client connect over HTTPS, in which case the proxy should be bypassed (generally speaking), and so REMOTE_ADDR would be accurate.
Any ideas?
Anything client-side (javascript, java) will give you the PCs IP address. Which could be an internal IP address like 10.0.0.1.
Re: SSL + REMOTE_ADDR, most workplace proxies send all the SSL through an application level proxy, SOME just allow 443 outbound. Any thing coming thru a proxy will still give you the proxy address, as the proxy is still the computer making the connection to your webserver.
HTTPS through a proxy is still a possibility, if the proxy is non-transparent (say, with a client on a corporate network). With HTTPS through a proxy, the REMOTE_ADDR will still be the proxy address - the proxy is still in the path, it just only gets to see the encrypted traffic.
If the client is going through a proxy, you'll have to rely on the proxy telling you their IP. The X-Forwarded-For header will contain this, but you can only really rely on this if you trust the proxy. If this is for logging purposes, log both REMOTE_ADDR and X-Forwarded-For. If it's for something else, you'll need to maintain a whitelist of proxies (as determined by REMOTE_ADDR) that you'll accept X-Forwarded-For from.

Resources