Nginx: capture post requests when upstream is offline - nginx

I'm using Nginx as a reverse proxy for a Ruby on Rails application.
The application has 2 critical endpoints that are responsible for capturing data from customers who're registering their details with our service. These endpoints take POST data from a form that may or may not be hosted on our website.
When our application goes down for maintenance (rare, but we have a couple of SPOF services), I would like to ensure the POST data is captured so we don't lose it forever.
Nginx seems like a good place to do this given that it's already responsible for serving requests to the upstream Rails application, and has a custom vhost configuration in place that serves a static page for when we enable maintenance mode. I figured this might be a good place for additional logic to store these incoming POST requests.
The issue I'm having is that Nginx doesn't parse POST data unless you're pointing it at an upstream server. In the case of our maintenance configuration, we're not; we're just rendering a maintenance page. This means that $request_body¹ is empty. We could perhaps get around this by faking a proxy server, or maybe even pointing Nginx at itself and enabling the logger on a particular location. This seems hacky though.
Am I going about this the wrong way? I've done some research and haven't found a canonical way to solve this use-case. Should I be using a third-party tool and not Nginx?
1: from ngx_http_core_module: "The variable’s value is made available in locations processed by the proxy_pass, fastcgi_pass, uwsgi_pass, and scgi_pass directives when the request body was read to a memory buffer."

Related

nginx reverse proxy based on response header

I have a setup of the website that powered by 2 web applications under the hood.
One application (fast) is supposed to handle catalog pages.
Another application (slow) is supposed to handle customer/cart/checkout pages.
Both applications should run on the same host:
example.com:80 (fast) and example.com:8000 (slow)
Of course, port 8000 is not exposed for the visitor and used internally by nginx.
I want that web requests reach slow application only if fast application returned specific response header, for example X-catalog-not-found.
The expected result is following:
all requests go to fast application example.com:80
if fast application found a product by uri - it renders the page
if fast application did not find a product by uri - it sends empty body and response header X-catalog-not-found
based on the header received in prev step, nginx performs proxy pass to slow application example.com:8000
I feel that ngx_http_proxy_module or/and nginx_upstream module should be used, but haven't found a working solution after reading the docs.

Will a request to api.myapp.com be slower then a request to api-myapp.herokuapp.com when hosted on heroku?

I'm trying to understand the best way to handle SOA on heroku, i've got it into my head that making requests to custom domains will somehow be slower, or would all requests go "out" via the internet?
On previous projects which are SOA in nature we've had dedicated hosting so could make requests like http://blogs/ (obviously on the internal network) I'm wondering if heroku treats *.herokuapp.com requests as "internal"... Or is it clever enough to know the myapp.com is actually myapp.herokuapp.com and route locally, or am i missing the point completely, and in fact all requests are "external"
What you are asking about is general knowledge of how internet requests are working.
Whenever you do request from your application to lets say example.com, domain name will first be translated into IP address using so called DNS servers.
So this how it works: does not matter you request myapp.com or myapp.heroku.com you will always request infromation from specific IP address, and domain name you have requested will be passed as part of request headers.
Server which receives this request will try to find in its internal records this domain name and handle request accordingly.
So conclusion is that does not matter you put myapp.com or myapp.heroku.com, the speed of request will always be same.
PS: As heroku will load balance your requests between different instances of your running myapp.com, the speed here will depend on several factors: how quickly your application will respond, how many instances you have running and load average per instance, how much is load balancer loaded at the moment. But surely it will not depend on which domain name you use.

Buffered uploading via HTTP Proxy

I am trying to solve an issue with uploads to our web infastructure.
When a user uploads media to our site, it is proxied (via our Web Proxy tier) to a Java backend with a limited number of threads. When a user has a slow connection or a large upload, this holds one of the Java threads open a long period of time, reducing overall capacity.
To mitigate this I'd like to implement an 'upload proxy' which will accept the entire HTTP POST data of the upload and only when it has received all of the data it will proxy that POST to the Java backend quickly, pushing the problem of the upload thread being held open to a HTTP proxy.
Initially I found Apache Traffic Server has a 'buffer_upload' plugin, but it seems a bit bleeding edge and has no support for regex in URLs, although it would solve most of my issues.
Does anyone know a proxy product that would be able to do what I am suggesting (aside from Apache Traffic Server)?
I see that Nginx has fairly detaild buffer settings for proxying, but it doesn't seem (from docs/explanations) to wait for the whole POST before opening a backend connection/thread. Do I have this right?
Cheers,
Tim
Actually, nginx always buffers requests before opening a connection to the backend. It is possible to turn off response buffering using proxy_buffering or setting an X-Accel-Buffering response header for per-response buffering control.

What is the point of sub request at all?

Seems most web server support subrequest.
I found here's one question about sub request:
Subrequest for PHP-CGI
But what is the point of sub request at all,when is that kind of thing really useful?
Is it defined in http protocol?
Apache subrequests can be used in your e.g. PHP application with virtual() to access resources from the same server. The resource gets processed from Apache normally as normal requests are, but you don't have the overhead of sending a full HTTP request on the network interface.
Less overhead is probably the only reason one would want to use it instead of a real HTTP request.
Edit: The resources are processed by apache, which means that the apache modules are used if configured. You can request a mod_perl or mod_ruby processed resource from PHP.

Is it bad practice to select upstream servers based upon the HTTP method?

I'm wondering if it is bad practice to have a reverse proxy that selects the upstream server depending on the HTTP method used?
The background is that I have an abitrary web server that handles POST requests with some logic behind. The same resources also contain static content, that can be retrieved using GET. After some benchmarking I realized that nginx would handle the static content way faster than my abitrary web server doing this.
I checked the option to forward incoming requests internally using nginx, which is feasible.
But this would lead to the fact that different servers would serve a distinct resource, only depending on issuing a GET or POST, including different header fields.
No, it's not bad practice, partitioning the tasks by the nature of the task is perfectly fine, as long as you don't need to store persistent per-user-session data on the server.

Resources