I'm running an nginx service in a docker container with Google Container Engine which forwards specific domain names to other services, like API, Frontend, etc. I have simple cluster for that with configured services. Nginx Service is Load Balance.
The REMOTE_ADDR environmental variable always contains an internal address in the Kubernetes cluster. I looked for is HTTP_X_FORWARDED_FOR but it's missing from the request headers. Is it possible to configure the service to save the external client ip in the requests?
With the current implementation of L3 balancing (as of Kubernetes 1.4) it isn't possible to get the source IP address for a connection to your service.
It sounds like your use case might be well served by using an Ingress object (or by manually creating an HTTP/S load balancer), which will put the source IP address into a the X-Forwarded-For HTTP header for easy retrieval by your backends.
Related
I have an application that needs to use a proxy (call it proxy1) to access some https endpoints outside of its network. The application doesn't support proxy settings, so I'd like to provide it a reverse proxy url, and I would prefer not to provide tls certs for proxy1, so I would use http for application -> proxy1.
I don't have access to the application host or forward proxy mentioned below, so I cannot configure networking there.
The endpoints the application needs are https, so proxy1 must make its outbound connections via https.
Finally, this whole setup is within a corporate network that requires a forward proxy (call it proxy2) for outbound internet, so my proxy1 needs to chain to proxy2 / use it as a parent.
I tried squid and it worked well for http only, but I couldn't get it to accept http inbound while using https outbound. Squid easily supported the parent proxy2.
I tried haproxy, but had the same result as with squid.
I tried nginx and it did what I wanted with http -> proxy -> https, but doesn't support a parent proxy. I considered setting up socat as in this answer, or using proxy_pass and proxy_set_header as in this answer, but I can't shake the feeling there's a cleaner way to achieve the requirements.
This doesn't seem like an outlandish setup, is it? Or is there a preferred approach for it? Ideally one using squid or nginx.
You can achive this without the complexity by using a port forwarder like socat. Just install it on a host to do the forwarding (or locally on the app server if you wish to) and create a listener that forwards connections through the proxy server. Then on your application host use a local name resolution overide to map the FQDN to the forwarder.
So, the final config should be the app server using a URI that points to the forwarding server (using its address if no name resolution excists), which has a socat listener that points to the the corporate proxy. No reverse proxy required.
socat TCP4-LISTEN:443,reuseaddr,fork \
PROXY:{proxy_address}:{endpoint_fqdn}:443,proxyport={proxy_port}
Just update with your parameters.
I have the following setup:
Client -> AWS ELB -> Nginx Ingress -> Pod
In the ELB logs, I can see the real IP of these clients. ELB sends it as the X-Forwarded-For header value to my Ingress controller.
I need to set the whitelist-source-range in the Ingress for the application, but the issue is that it uses the remote IP address, not the one in the X-Forwarded-For header.
I can see some solutions here:
Transform ALB into an NLB, so it preserves the originating client's IP
Make the Nginx controller source range whitelist based on the X-Forwarded-For header
Make the Nginx controller transform the request originating IP into the one in the header
The first is not ideal for me. I didn't want to maintain and pay for another load balancer. I don't know if the second is possible. I think the third is feasible, yet I have no idea how to do it. I know there's something related, which is the proxy protocol, but I don't see how it works, and I don't want to add something I don't understand into my production environment.
The load balancer is for several applications in my Kubernetes environment, so adding these IPs to the whitelist in the security group is not ideal.
How could I solve this issue?
My last resource will be to use Cloudflare. I want to keep as much of my configuration as possible inside Kubernetes, but I'll go for it if it's impossible.
Edit: this doesn't solve my problems, I have CIDRs to whitelist, not a specific IP.
So if you are on AWS, why are you using an Nginx Ingress controller?
You can use the AWS Load Balancer Controller, which will provision and/or manage AWS ALB's automatically.
For your particular usecase, you could add a WAF WebACL for a particular target in the ALB. You can do that manually, or use the alb.ingress.kubernetes.io/wafv2-acl-arn annotation for the AWS Load Balancer Controller.
Alternatively, you could run a separate ALB in a separate subnet, and setup IP whitelisting via the security group of that ALB's. And then make sure that separate ALB is only used for those ingresses that need it.
Again, when using the AWS Load Balancer Controller instead of the Nginx Ingress Controller, this can be done automatically, without extra maintenance.
I have configured NGINX as a reverse proxy with web sockets enabled for a backend web application with multiple replicas. The request from NGINX does a proxy_pass to a Kubernetes service which in turn load balances the request to the endpoints mapped to the service. I need to ensure that the request from a particular client is proxied to the same Kubernetes back end pod for the life cycle of that access, basically maintaining session persistence.
Tried setting the sessionAffinity: ClientIP in the Kubernetes service, however this does the routing based on the client IP which is of the NGINX proxy. Is there a way to make the Kubernetes service do the affinity based on the actual client IP from where the request originated and not the NGINX internal pod IP ?
This is not an option with Nginx. Or rather it's not an option with anything in userspace like this without a lot of very fancy network manipulation. You'll need to find another option, usually an app-specific proxy rules in the outermost HTTP proxy layer.
Background
I have an in-cluster Kubernetes pod/application that works fine when accessing it via an nginx-ingress ingress controller (requires specific Host HTTP header), but it cannot be accessed by other in-cluster pods/applications (i.e. for testing) due to the pods using different host names (e.g. service-name.namespace.svc.cluster.local) rather than the FQDN of the K8S master (in the LAN).
Plan So Far
I think the only way to (easily) resolve this is to setup an in-cluster forward-proxy nginx instance. Ideally, the service is either a side-car for the pod that needs to have headers re-written, or it needs to be a general in-cluster proxy that multiple services can access.
Question
How would I setup an in-cluster nginx forward proxy service?
Should it be a sidecar, or a general service any pod can access?
Work So Far
The linked "similar" questions don't appear to be helpful for my use case (i.e. don't show how to configure an in-cluster proxy), or are focused on proxying to IPs external to the cluster (i.e. I need to proxy HTTP requests, and re-write their headers, to in-cluster resources).
I've found some interesting reading on the X-Forwarded-* headers, including the Reverse Proxy Request Headers section in the Apache documentation, as well as the Wikipedia article on X-Forwarded-For.
I understand that:
X-Forwarded-For gives the address of the client which connected to the proxy
X-Forwarded-Port gives the port the client connected to on the proxy (e.g. 80 or 443)
X-Forwarded-Proto gives the protocol the client used to connect to the proxy (http or https)
X-Forwarded-Host gives the content of the Host header the client sent to the proxy.
These all make sense.
However, I still can't figure out a real life use case of X-Forwarded-Host. I understand the need to repeat the connection on a different port or using a different scheme, but why would a proxy server ever change the Host header when repeating the request to the target server?
If you use a front-end service like Apigee as the front-end to your APIs, you will need something like X-FORWARDED-HOST to understand what hostname was used to connect to the API, because Apigee gets configured with whatever your backend DNS is, nginx and your app stack only see the Host header as your backend DNS name, not the hostname that was called in the first place.
This is the scenario I worked on today:
Users access certain application server using "https://neaturl.company.com" URL which is pointing to Reverse Proxy. Proxy then terminates SSL and redirects users' requests to the actual application server which has URL of "http://192.168.1.1:5555". The problem is - when application server needed to redirect user to other page on the same server using absolute path, it was using latter URL and users don't have access to this. Using X-Forwarded-Host (+ X-Forwarded-Proto and X-Forwarded-Port) allowed our proxy to tell application server which URL user used originally and thus server started to generate correct absolute path in its responses.
In this case there was no option to stop application server to generate absolute URLs nor configure it for "public url" manually.
I can tell you a real life issue, I had an issue using an IBM portal.
In my case the problem was that the IBM portal has a rest service which retrieves an url for a resource, something like:
{"url":"http://internal.host.name/path"}
What happened?
Simple, when you enter from intranet everything works fine because internalHostName exists but... when the user enter from internet then the proxy is not able to resolve the host name and the portal crashes.
The fix for the IBM portal was to read the X-FORWARDED-HOST header and then change the response to something like:
{"url":"http://internet.host.name/path"}
See that I put internet and not internal in the second response.
For the need for 'x-forwarded-host', I can think of a virtual hosting scenario where there are several internal hosts (internal network) and a reverse proxy sitting in between those hosts and the internet. If the requested host is part of the internal network, the requested host resolves to the reverse proxy IP and the web browser sends the request to the reverse proxy. This reverse proxy finds the appropriate internal host and forwards the request sent by the client to this host. In doing so, the reverse proxy changes the host field to match the internal host and sets the x-forward-host to the actual host requested by the client. More details on reverse proxy can be found in this wikipedia page http://en.wikipedia.org/wiki/Reverse_proxy.
Check this post for details on x-forwarded-for header and a simple demo python script that shows how a web-server can detect the use of a proxy server: x-forwarded-for explained
One example could be a proxy that blocks certain hosts and redirects them to an external block page. In fact, I’m almost certain my school filter does this…
(And the reason they might not just pass on the original Host as Host is because some servers [Nginx?] reject any traffic to the wrong Host.)
X-Forwarded-Host just saved my life. CDNs (or reverse proxy if you'd like to go down to "trees") determine which origin to use by Host header a user comes to them with. Thus, a CDN can't use the same Host header to contact the origin - otherwise, the CDN would go to itself in a loop rather than going to the origin. Thus, the CDN uses either IP address or some dummy FQDN as the Host header fetching content from the origin. Now, the origin may wish to know what was the Host header (aka website name) the content is asked for. In my case, one origin served 2 websites.
Another scenario, you license your app to a host URL then you want to load balance across n > 1 servers.