Confusing on Forwarded header - http

The new standard for headers (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Forwarded) is confusing to me. I also tried reading the specification (https://datatracker.ietf.org/doc/html/rfc7239#section-4) and it's equally as confusing.
I have a basic configuration like this:
client -> ingress (load balancer) -> reverse proxy -> service
The Forwaded header defines four fields as follows:
by - The interface where the request came in to the proxy server.
for - The client that initiated the request and subsequent proxies in a chain of proxies.
host - The Host request header field as received by the proxy.
proto - Indicates which protocol was used to make the request (typically "http" or "https").
What would each of these be set to in my case (excluding proto, that one is obvious)? The "for" seems to just be the client host, but then then I don't understand what it means by a "chain of proxies". I don't think it applies in my example, but I still want to understand.
My proxy has a registry that looks up all of my services, so as of now, I set this header inside the proxy. Is that where it is intended to be set? Thanks to anyone who can answer in advance.

I am not aware of any Kubernetes Ingress Controller that implements this header. X-Forwarded-For and X-Forwarded-Proto are what basically everything uses.

Related

Kubernetes Ingress/Reverse and Forward Proxy with ssl interception

I have a requirement that incoming as well as outgoing SSL traffic to a POD in a Namespace has to terminate at a proxy (the same), this proxy should look at a special part in the header of the packet and decide if the packet is allowed for out or in, if not the proxy has to send an 403.
I already took look at Istio and Envoy, but I couldn't find a solution for my problem.
Now I decided to start a separate NGINX-POD in my namespace and always route the traffic through it. So I'll be able to create a custom python module that does the Checks for me.
But I would rather work with native methods, if possible. Now I wanted to ask you, if you have an idea, what to use for this scenario.
Client -> nginx ingress (ssl pathrough) -> nginx (reverse/forward) proxy -> app
app -> nginx (reverse/forward) proxy -> Client
EDIT: or should I take a look at squid or something like that? :O

What is the correct way to render absolute URLs behind a reverse proxy?

I have a web application running on a server (let's say on localhost:8000) behind a reverse proxy on that same server (on myserver.example:80). Because of the way the reverse proxy works, the application sees an incoming request targeted at localhost:8000 and the framework I'm using therefore tries to generate absolute URLs that look like localhost:8000/some/ressource instead of myserver.example/some/ressource.
What would be "the correct way" of generating an absolute URL (namely, determining what hostname to use) from behind a proxy server like that? The specific proxy server, framework and language don't matter, I mean this more in an HTTP sense.
From my initial research:
RFC7230 explicitly says that proxies MUST change the Host header when passing the request along to make it look like the request came from them, so it would look like using Host to determine what hostname to use for the URL, yet in most places where I have looked, the general advice seems to be to configure your reverse proxy to not change the Host header (counter to the spec) when passing the request along.
RFC7230 also says that "request URI reconstruction" should use the following fields in order to find what "authority component" to use, though that seems to also only apply from the point-of-view of the agent that emitted that request, such as the proxy:
Fixed URI authority component from the server or outbound gateway config
The authority component from the request's firsr line if it's a complete URI instead of a path
The Host header if it's present and not empty
The listening address or hostname, alongside with the incoming port number if it's not the default one for the protocol
HTTP 1.0 didn't have a Host header at all, and that header was added for routing purposes, not for URL authority resolution.
There are headers that are made specifically to let proxies to send the old value of Host after routing, such as Via, Forwarded and the unofficial X-Forwarded-Host, which some servers and frameworks will check, but not all, and it's unclear which one should even take priority given how there's 3 of them.
EDIT: I also don't know whether HTTPS would work differently in that regard, given that the headers are part of the encrypted payload and routing has to be performed another way because of this.
In general I find it’s best to set the real host and port explicitly in the application rather than try to guess these from the incoming request.
So for example Jira allows you to set the Base URL through which Jira will be accessed (which may be different to the one that it is actually run as). This means you can have Jira running on port 8080 and have Apache or Nginx in front of it (on the same or even a different server) on port 80 and 443.

How does a HTTP Reverse Proxy work

I saw a website that offers HTTP Reverse proxy and TCP Reverse Proxy, I know what TCP Reverse Proxy is, but I have no clue what the difference is between HTTP & TCP Reverse proxy.
Try to explain it to someone that would like to buy it, because i want to know how it works. Those website's that offers it usually says "1 Protected Domain".
Contrary to a simple TCP reverse proxy a HTTP proxy can choose the target HTTP server based on the content of the HTTP request, i.e. based on the target host name (Host header), the path, Cookies, the User-Agent etc. It can also include additional headers like X-Forwarded-For to include the clients original IP address.

Real life usage of the X-Forwarded-Host header?

I've found some interesting reading on the X-Forwarded-* headers, including the Reverse Proxy Request Headers section in the Apache documentation, as well as the Wikipedia article on X-Forwarded-For.
I understand that:
X-Forwarded-For gives the address of the client which connected to the proxy
X-Forwarded-Port gives the port the client connected to on the proxy (e.g. 80 or 443)
X-Forwarded-Proto gives the protocol the client used to connect to the proxy (http or https)
X-Forwarded-Host gives the content of the Host header the client sent to the proxy.
These all make sense.
However, I still can't figure out a real life use case of X-Forwarded-Host. I understand the need to repeat the connection on a different port or using a different scheme, but why would a proxy server ever change the Host header when repeating the request to the target server?
If you use a front-end service like Apigee as the front-end to your APIs, you will need something like X-FORWARDED-HOST to understand what hostname was used to connect to the API, because Apigee gets configured with whatever your backend DNS is, nginx and your app stack only see the Host header as your backend DNS name, not the hostname that was called in the first place.
This is the scenario I worked on today:
Users access certain application server using "https://neaturl.company.com" URL which is pointing to Reverse Proxy. Proxy then terminates SSL and redirects users' requests to the actual application server which has URL of "http://192.168.1.1:5555". The problem is - when application server needed to redirect user to other page on the same server using absolute path, it was using latter URL and users don't have access to this. Using X-Forwarded-Host (+ X-Forwarded-Proto and X-Forwarded-Port) allowed our proxy to tell application server which URL user used originally and thus server started to generate correct absolute path in its responses.
In this case there was no option to stop application server to generate absolute URLs nor configure it for "public url" manually.
I can tell you a real life issue, I had an issue using an IBM portal.
In my case the problem was that the IBM portal has a rest service which retrieves an url for a resource, something like:
{"url":"http://internal.host.name/path"}
What happened?
Simple, when you enter from intranet everything works fine because internalHostName exists but... when the user enter from internet then the proxy is not able to resolve the host name and the portal crashes.
The fix for the IBM portal was to read the X-FORWARDED-HOST header and then change the response to something like:
{"url":"http://internet.host.name/path"}
See that I put internet and not internal in the second response.
For the need for 'x-forwarded-host', I can think of a virtual hosting scenario where there are several internal hosts (internal network) and a reverse proxy sitting in between those hosts and the internet. If the requested host is part of the internal network, the requested host resolves to the reverse proxy IP and the web browser sends the request to the reverse proxy. This reverse proxy finds the appropriate internal host and forwards the request sent by the client to this host. In doing so, the reverse proxy changes the host field to match the internal host and sets the x-forward-host to the actual host requested by the client. More details on reverse proxy can be found in this wikipedia page http://en.wikipedia.org/wiki/Reverse_proxy.
Check this post for details on x-forwarded-for header and a simple demo python script that shows how a web-server can detect the use of a proxy server: x-forwarded-for explained
One example could be a proxy that blocks certain hosts and redirects them to an external block page. In fact, I’m almost certain my school filter does this…
(And the reason they might not just pass on the original Host as Host is because some servers [Nginx?] reject any traffic to the wrong Host.)
X-Forwarded-Host just saved my life. CDNs (or reverse proxy if you'd like to go down to "trees") determine which origin to use by Host header a user comes to them with. Thus, a CDN can't use the same Host header to contact the origin - otherwise, the CDN would go to itself in a loop rather than going to the origin. Thus, the CDN uses either IP address or some dummy FQDN as the Host header fetching content from the origin. Now, the origin may wish to know what was the Host header (aka website name) the content is asked for. In my case, one origin served 2 websites.
Another scenario, you license your app to a host URL then you want to load balance across n > 1 servers.

How does web application restore original URL after HTTP proxy or load balancer?

You deploy Web application (in my case Java EE + Spring MVC, but I think it doesn't have matter what web-stack is used) and hide it behind HTTP proxy or load balancer.
Proxy/balancer software can fix HTTP headers. This is not question.
But application itself put links into generated HTML:
...
...
Proxy/balancer can use different $HOST:$PORT or $CONTEXT part. In case of Java EE with JSP this piece of code fix this issue:
<c:url value="$PATH">
${pageContext.request.contextPath}/$PATH
I want to know how Web framework gets knowledge about user requested $HOST:$PORT/$CONTEXT so it can be rendered in HTML?
Is this info extracted from:
http://en.wikipedia.org/wiki/X-Forwarded-For
non-standard de-facto tag? It look like:
X-Forwarded-For: client, proxy1, proxy2, ..., proxyN
so web framework extract second argument (which is proxy1 in my example, or host IP if N == 0) to provide to you $HOST:$PORT/$CONTEXT?
This is going to be dependent on your particular proxy or load balancer. X-Forwarded-For is very common, but it usually only tells you about the IP address of the original request.
In AWS you can use three headers to construct more of the original URL:
X-Forwarded-For
X-Forwarded-Proto
X-Forwarded-Port
Apache uses these and you can configure other custom headers with additional data:
X-Forwarded-For - The IP address of the client.
X-Forwarded-Host - The original host requested by the client in the Host HTTP request header.
X-Forwarded-Server - The hostname of the proxy server.
In Azure, these headers are available:
x-original-host
x-original-path
Bottom line is that there is no standard way of re-constructing the original URL. You will have to use the documentation of whatever proxy you are using. Some data may be missing. In some cases you may be able to configure the proxy to send missing data in custom headers.

Resources