What is the point of sub request at all? - http

Seems most web server support subrequest.
I found here's one question about sub request:
Subrequest for PHP-CGI
But what is the point of sub request at all,when is that kind of thing really useful?
Is it defined in http protocol?

Apache subrequests can be used in your e.g. PHP application with virtual() to access resources from the same server. The resource gets processed from Apache normally as normal requests are, but you don't have the overhead of sending a full HTTP request on the network interface.
Less overhead is probably the only reason one would want to use it instead of a real HTTP request.
Edit: The resources are processed by apache, which means that the apache modules are used if configured. You can request a mod_perl or mod_ruby processed resource from PHP.

Related

Nginx: capture post requests when upstream is offline

I'm using Nginx as a reverse proxy for a Ruby on Rails application.
The application has 2 critical endpoints that are responsible for capturing data from customers who're registering their details with our service. These endpoints take POST data from a form that may or may not be hosted on our website.
When our application goes down for maintenance (rare, but we have a couple of SPOF services), I would like to ensure the POST data is captured so we don't lose it forever.
Nginx seems like a good place to do this given that it's already responsible for serving requests to the upstream Rails application, and has a custom vhost configuration in place that serves a static page for when we enable maintenance mode. I figured this might be a good place for additional logic to store these incoming POST requests.
The issue I'm having is that Nginx doesn't parse POST data unless you're pointing it at an upstream server. In the case of our maintenance configuration, we're not; we're just rendering a maintenance page. This means that $request_body¹ is empty. We could perhaps get around this by faking a proxy server, or maybe even pointing Nginx at itself and enabling the logger on a particular location. This seems hacky though.
Am I going about this the wrong way? I've done some research and haven't found a canonical way to solve this use-case. Should I be using a third-party tool and not Nginx?
1: from ngx_http_core_module: "The variable’s value is made available in locations processed by the proxy_pass, fastcgi_pass, uwsgi_pass, and scgi_pass directives when the request body was read to a memory buffer."

Nginx - Reacting to Upstream Response

I am using nginx as reverse proxy for file storage upload with an external provider.
When I am processing a file upload, I need to keep track (in my database) whether an upload was successful before returning the response to the user. I would therefore like to use the ngx.location.capture method provided in the lua-nginx-module to talk to my backend about the outcome of the request. Since I need to wait for the response of the upstream server I can only issue the capture in header_filter_by_lua. Unluckily I cannot issue any outwards communication in header_filter_by_lua. ngx.location.capture, ngx.socket.* and ngx.exec are only available when the response has not yet arrived.
How can I react to an upstream response in nginx?
Other approaches I've thought about:
Have a script watch the access log and then issue a curl request. (Seems like there should be an easier way)
Initially send the file via ngx.location.capture in content_by_lua (I don't think this would handle up to 5 GB filesize)
Help is appreciated :)
use for /upload location:
content_by_lua_file with resty.upload module

Understanding the php pipeline when using nginx and php-fpm

So I'm trying to understand how the PHP pipeline works from request to response, specifically when using nginx and php-fpm.
I'm coming from a java/.net background so normally once the process is sent the request it uses threads etc. to handle the request/response cycle.
With php/nginx, I noticed the fpm process is setup like:
location / {
include /path/to/php-fpm;
}
Here are a few questions I have:
when nginx recieves request, does php-fpm take over, if so, at what point?
does each request spawn another process/thread?
when you make a change to a php source code file, do you have to reload? If not, does this mean each time a request comes in it parses the source code each time?
Any other interesting points about how a php request is served that would be great.
You configuration in your post is irrelevant as include /path/to/php-fpm; is the inclusion of an nginx configuration subpart.
It doens't take over anything, the request is passed from nginx to php-fpm with fastcgi_pass and nginx waits for the reply to come back but serve other request in the meantime.
Nginx uses the reactor pattern so requests are served by a limited amount of processes (usually the amount is the same than the amount of CPU cores available on the machine). It's an event driven web server that uses event polling to treat many requests on each process (asynchronous). In the other side php fpm uses a process pool to execute php code.
No you don't, because there's no caching anywhere unless you setup browser client's caching headers or server cache. It doesn't parse the php source code each time if the file is unchanged and frequently accessed because of OS caching. When the file content changes then yes it will be parsed again, as a normal file would be.

Varnish + Static HTML Pages

I've recently come across a http web accelerator called Varnish. From what I've read, Varnish speeds up delivery of a website by optimizing every process of HTTP communication with the HTTP server using a reverse proxy configuration.
My question is that if you have a website that has its caching mechanism configured all the way down to static html files then how much more of an effect will Varnish have on this? Does a reverse proxy cut down the work that is performed by the HTTP server to process the request? If you have everything extensively cached on the server-side (HTTP headers, Etags, Expires Headers, Database Caching, Fragment and Page caching) then what more will a HTTP accelerator do to improve on this?
Firstly, we should differentiate between two different types of caching that go on in a normal web system: HTTP caching and server-side caching.
HTTP caching is controlled by HTTP headers, notably as you point out ETag and the various expiry mechanisms (including Expires and various aspects of Cache-Control). This is all covered in RFC 2616 (HTTP), section 13, and allows HTTP caches to return a response to an HTTP request from a client without having to go back to the origin server. In effect, the HTTP caching mechanism allows another machine between client and server to act as if it's the server, in certain cases. This is actually what varnish is doing, as we'll see in a minute; another common use that many people are familiar with is when ISPs provide an HTTP cache within their network, that can generally respond faster to their subscribers (and so improve perceived performance) than the origin servers outside their network.
Server-side caching includes database caching, and fragment and page caching, which are really all just ways of the web server avoiding doing some expensive operation (say, a database query, or rendering a particular piece of a template) by doing it once then keeping the result in a cache for a while.
I said earlier that varnish was an HTTP cache, which means that straight away it's able to be more efficient than a web server serving even a static file. Consider what a web server has to do:
parse the HTTP request
map the URI (and any relevant request headers, such as Accept-Encoding) onto a file
pull up information about the file to build the HTTP headers in the response; these are known as entity headers (RFC 2616 section 7.1, which include things such as Content-Length, Content-Type and the Expires and Last-Modified headers used in HTTP caching)
figure out what additional response headers (RFC 2616 section 6.2; these include ETag and Vary, both important parts of HTTP caching) and general header fields (RFC 2616 section 4.5) are needed
write the HTTP status line and headers out to the network
write the file's contents out to the network
By comparison, varnish is upstream of all of this, so all it has to do is:
parse the HTTP request
map the URI (and any relevant request headers) onto an entry in its internal cache
see if there's an entry; if there is, write it to the network; the HTTP headers will have been stored in the cache
If there isn't an entry, varnish has to do a little more work:
connect to a web server behind it that will run through all the steps 1-6 in the first list to generate a response
write the response to the network, including all the HTTP headers
store the response in its cache
In particular because the HTTP headers and entity body (the entire response) can be cached by varnish, if it can serve out of its cache it has less work to do. When you start generating the response dynamically in your server, the difference can become even more pronounced: say you have a page that takes 5 seconds to generate, but is the same for everyone hitting your site, varnish should be able to serve that in at most milliseconds out of the cache (plus whatever time it takes to get the response across the network to the HTTP client), and has a neat mechanism (the grace period) so it can keep on doing it while hitting the backend server once to refresh the cached version of the page.
Of course, you can introduce server-side caching to improve the speed with which your web server can process a request, but if you have a response you can cache in varnish it's generally going to be faster to do that. (There are various things that are hard to cache in varnish, particularly if you're using cookies or have pages that change depending on which user is looking at them. While it's possible to continue using varnish in these cases, unless you need really incredible speed, as far as I'm aware most people start optimising those cases using server-side caching and other techniques before hitting up varnish.)
(Note that varnish can also edit headers and indeed data going in and out of the cache, which complicates things. But the main points still stand, and even while editing things on the fly varnish can be incredibly fast.)

CGI to Handle Multiple Requests on a Persistent HTTP Connection

CGI programs typically get a single HTTP request.
HTTP 1.1 supports persistent HTTP connections whereby multiple HTTP requests/responses are made w/o closing the connection.
Is there a way for a CGI program (or similar mechanism) to handle multiple HTTP requests/responses on the same connection?
I am using Apache httpd.
Keep-alives are one of the higher-level HTTP features that is wholly dealt with by the web server. They are out-of-scope for CGI applications themselves.
Accessing CGI scripts through Apache mod_cgi works with keep-alive for me. The browser re-uses the same TCP connection to fetch the page and then resources referred to by it, without the scripts in question having to do anything special.
If you mean you would like to have the same CGI process handle one request and then the next (instead of the process ending and a new one being spawned), then I'm afraid that's not possible. The web server will intercept keep-alives and make them look like single requests before your scripts can do anything about it. (If you want to do that to improve performance, consider a different gateway interface, such as FastCGI or language-specific options like WSGI.)
SCGI sounds exactly like what you want. It is similar to FastCGI but a simpler solution to implement (the S stands for Simple :)).

Resources