nginx reverse proxy based on response header - nginx

I have a setup of the website that powered by 2 web applications under the hood.
One application (fast) is supposed to handle catalog pages.
Another application (slow) is supposed to handle customer/cart/checkout pages.
Both applications should run on the same host:
example.com:80 (fast) and example.com:8000 (slow)
Of course, port 8000 is not exposed for the visitor and used internally by nginx.
I want that web requests reach slow application only if fast application returned specific response header, for example X-catalog-not-found.
The expected result is following:
all requests go to fast application example.com:80
if fast application found a product by uri - it renders the page
if fast application did not find a product by uri - it sends empty body and response header X-catalog-not-found
based on the header received in prev step, nginx performs proxy pass to slow application example.com:8000
I feel that ngx_http_proxy_module or/and nginx_upstream module should be used, but haven't found a working solution after reading the docs.

Related

Nginx: capture post requests when upstream is offline

I'm using Nginx as a reverse proxy for a Ruby on Rails application.
The application has 2 critical endpoints that are responsible for capturing data from customers who're registering their details with our service. These endpoints take POST data from a form that may or may not be hosted on our website.
When our application goes down for maintenance (rare, but we have a couple of SPOF services), I would like to ensure the POST data is captured so we don't lose it forever.
Nginx seems like a good place to do this given that it's already responsible for serving requests to the upstream Rails application, and has a custom vhost configuration in place that serves a static page for when we enable maintenance mode. I figured this might be a good place for additional logic to store these incoming POST requests.
The issue I'm having is that Nginx doesn't parse POST data unless you're pointing it at an upstream server. In the case of our maintenance configuration, we're not; we're just rendering a maintenance page. This means that $request_body¹ is empty. We could perhaps get around this by faking a proxy server, or maybe even pointing Nginx at itself and enabling the logger on a particular location. This seems hacky though.
Am I going about this the wrong way? I've done some research and haven't found a canonical way to solve this use-case. Should I be using a third-party tool and not Nginx?
1: from ngx_http_core_module: "The variable’s value is made available in locations processed by the proxy_pass, fastcgi_pass, uwsgi_pass, and scgi_pass directives when the request body was read to a memory buffer."

How to make a fault tolerant system which can immediately handle the situation when a server goes down

Before XYZ.com was down I noticed that my request was being routed to IP address 192.33.31.xxx and when it came up I noticed that my request was routed to IP 50.17.196.xxx , is it some sort of server switching? Isn't Dynamic server switching in case of fault is a way to make fault tolerant system?
Most of the heavy used websites works with load balancers directing the request to multiple servers. There are very simple to very complex logic based on which the routing is defined.
For e.g. the logic can be
Once the request is handled by one particular server the request from the same IP is handled by the same server OR
Use round robing way of handling request ( In this case IP has role to play)
route request to the least used server etc.
Website can also use hybrid approach of all the above.
Now if the website you were accessing was using the IP based routing then the case would have been
This is your first request. Then you would be assigned a server which will handle your request. (This can be round robin or other logic (for e.g. the server which has less load)
Now the load balancer might have logic that next X (say 100) requests or for Y (say 1 hour) time period, all requests from the same IP will be routed to the same server or both
So in your case if you make a request to XYZ it routes to some a.b.c.d server and then if you make another call after 1 hour or you make 101'th call you might get a different server which handles your request.
Now if your server went down, then you can have a fallback mechanisms like,
If a server do not respond in some pre configured time period, remove that server from live server List in load balancer configuration and reroute the request.
Or we can have an idle sitting pre running servers (it is costly but to be followed when each request has to served without fail). As soon as a server goes down, invoke this new server and assign it the ip which was assigned to the old server which went down.
There can be better and much more complex solution for this problem though.

Will a request to api.myapp.com be slower then a request to api-myapp.herokuapp.com when hosted on heroku?

I'm trying to understand the best way to handle SOA on heroku, i've got it into my head that making requests to custom domains will somehow be slower, or would all requests go "out" via the internet?
On previous projects which are SOA in nature we've had dedicated hosting so could make requests like http://blogs/ (obviously on the internal network) I'm wondering if heroku treats *.herokuapp.com requests as "internal"... Or is it clever enough to know the myapp.com is actually myapp.herokuapp.com and route locally, or am i missing the point completely, and in fact all requests are "external"
What you are asking about is general knowledge of how internet requests are working.
Whenever you do request from your application to lets say example.com, domain name will first be translated into IP address using so called DNS servers.
So this how it works: does not matter you request myapp.com or myapp.heroku.com you will always request infromation from specific IP address, and domain name you have requested will be passed as part of request headers.
Server which receives this request will try to find in its internal records this domain name and handle request accordingly.
So conclusion is that does not matter you put myapp.com or myapp.heroku.com, the speed of request will always be same.
PS: As heroku will load balance your requests between different instances of your running myapp.com, the speed here will depend on several factors: how quickly your application will respond, how many instances you have running and load average per instance, how much is load balancer loaded at the moment. But surely it will not depend on which domain name you use.

Buffered uploading via HTTP Proxy

I am trying to solve an issue with uploads to our web infastructure.
When a user uploads media to our site, it is proxied (via our Web Proxy tier) to a Java backend with a limited number of threads. When a user has a slow connection or a large upload, this holds one of the Java threads open a long period of time, reducing overall capacity.
To mitigate this I'd like to implement an 'upload proxy' which will accept the entire HTTP POST data of the upload and only when it has received all of the data it will proxy that POST to the Java backend quickly, pushing the problem of the upload thread being held open to a HTTP proxy.
Initially I found Apache Traffic Server has a 'buffer_upload' plugin, but it seems a bit bleeding edge and has no support for regex in URLs, although it would solve most of my issues.
Does anyone know a proxy product that would be able to do what I am suggesting (aside from Apache Traffic Server)?
I see that Nginx has fairly detaild buffer settings for proxying, but it doesn't seem (from docs/explanations) to wait for the whole POST before opening a backend connection/thread. Do I have this right?
Cheers,
Tim
Actually, nginx always buffers requests before opening a connection to the backend. It is possible to turn off response buffering using proxy_buffering or setting an X-Accel-Buffering response header for per-response buffering control.

Where is origin of HttpContext.Current.Request.Url.Host?

Why does HttpContext.Current.Request.Url.Host return a different URL than the URL used in the Web browser? For example, when entering "www.someurl.com" in the browser, the HttpContext.Current.Request.Url.Host variable is equal to "www.someotherurl.com".
HttpContext.Current.Request.Url.Host is the contents of the Host header that the ASP.net application receives. (see http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html for more info about HTTP headers like Host).
Usually the header that ASP.NET sees is identical to the Host header sent by the browser. However, it's possible they won't match if software or hardware is sitting in between the browser and your ASP.net code and is rewriting the Host header.
For example, large-scale budget hosters like GoDaddy do this so they can support multiple top-level domains on a single IIS website, even on their cheaper hosting plans. Instead of creating a separate IIS website (which adds to server load and hence cost), GoDaddy remaps requests for http://secondsite.com/ to a virtual directory on your "main" hosted site, e.g. http://firstsite.com/secondsite). They will change both the Host: header as well as the URL.
BTW, you can easily verify that this is what's happening by dumping the contents of HTTP Request Headers that your app is receiving.
Anyway, if you want to figure out who is changing the Host header, start with the people who host your web app (or the team which is responsible for your load balancer and/or reverse proxy), since they're likely the ones responsible for rewriting your Host header.

Resources