CGI to Handle Multiple Requests on a Persistent HTTP Connection - http

CGI programs typically get a single HTTP request.
HTTP 1.1 supports persistent HTTP connections whereby multiple HTTP requests/responses are made w/o closing the connection.
Is there a way for a CGI program (or similar mechanism) to handle multiple HTTP requests/responses on the same connection?
I am using Apache httpd.

Keep-alives are one of the higher-level HTTP features that is wholly dealt with by the web server. They are out-of-scope for CGI applications themselves.
Accessing CGI scripts through Apache mod_cgi works with keep-alive for me. The browser re-uses the same TCP connection to fetch the page and then resources referred to by it, without the scripts in question having to do anything special.
If you mean you would like to have the same CGI process handle one request and then the next (instead of the process ending and a new one being spawned), then I'm afraid that's not possible. The web server will intercept keep-alives and make them look like single requests before your scripts can do anything about it. (If you want to do that to improve performance, consider a different gateway interface, such as FastCGI or language-specific options like WSGI.)

SCGI sounds exactly like what you want. It is similar to FastCGI but a simpler solution to implement (the S stands for Simple :)).

Related

Where is http protocol implementation in Linux

I try understand how http work's and can't understand on which level http protocol implemented, it's OS level, or it's depend from where I need use it protocol? For example if I want use it on C I must implement it on C language as library and only then use it?
Http runs on top of tcp - and tcp is implemented in the network stack of your OS.
Http protocol is used between a client and a server. What a client sends is what a server receives, and vice-versa. Http was designed for the server to simply sit and wait for requests (possibly including data), and then respond (possibly including data).
All web servers implement the server side of http. In terms of applications (let's use the term "application" to mean "client", although some might say the server is an application), the client side of http protocol will, I suppose, most commonly be implemented in an application like a browser, but also command-line applications like curl and wget implement an http client. For languages such as Python there is a http server implementation in the standard library, or there are libraries such as requests which handle the client side of http so the python author just worries about the higher-level problem of which http requests to make.
So the answer is, http is not implemented in the OS, it is implemented in applications - some client-side, some server-side.
For your C application you will either have to implement http yourself (doesn't sound like fun to me but would be a good way of understanding http implementation, I suppose) or (much less stress and much more likely to have predictable correctish behaviour) use a library if you can find one.

HTTP client acting as a pseudo-server

Let's say I am going to deploy a server application that's likely to be placed behind a NAT/firewall and I don't want to ask users to tweak their NAT port mapping. In other words, connections to the server are impossible, but my app is a server application by nature, i.e. it sends back objects per URI.
Now, I'm thinking about initiating connections from the server periodically to see what requests are there to be responded to. I'm going to use HTTP via port 80 as something that would likely be working through NAT/firewall from virtually anywhere.
The question is, are there any standard considerations and common practices of implementing a client that can act as a server at the application level, specifically using HTTP? Any special HTTP headers? Design patterns?
E.g. I am thinking about the following scheme:
The client (which is my logical server) sends a dummy HTTP request to the server
The server responds back with non-standard headers X-Request-URI:, X-Host:, X-If-Modified-Since: etc, in other words, request headers wrapped into X-xxx as they are not standard in this situation; also requests to keep the connection alive
The client responds with a POST request that sends the requested object; again, uses wrapped headers (e.g. X-Status:, etc)
Unless there is a more "standard" way of doing something like this, do you think my approach is plausible?
Edit: an interesting discussion took place on reddit here
I've done something similar. This is very common. Client initiate the connection to the Server and keep the connection ALIVE. If the session is shut-down, client would re-initiate. When the session is up, Server can push anything to the client since it's client initiated.

What is the point of sub request at all?

Seems most web server support subrequest.
I found here's one question about sub request:
Subrequest for PHP-CGI
But what is the point of sub request at all,when is that kind of thing really useful?
Is it defined in http protocol?
Apache subrequests can be used in your e.g. PHP application with virtual() to access resources from the same server. The resource gets processed from Apache normally as normal requests are, but you don't have the overhead of sending a full HTTP request on the network interface.
Less overhead is probably the only reason one would want to use it instead of a real HTTP request.
Edit: The resources are processed by apache, which means that the apache modules are used if configured. You can request a mod_perl or mod_ruby processed resource from PHP.

Is it bad practice to select upstream servers based upon the HTTP method?

I'm wondering if it is bad practice to have a reverse proxy that selects the upstream server depending on the HTTP method used?
The background is that I have an abitrary web server that handles POST requests with some logic behind. The same resources also contain static content, that can be retrieved using GET. After some benchmarking I realized that nginx would handle the static content way faster than my abitrary web server doing this.
I checked the option to forward incoming requests internally using nginx, which is feasible.
But this would lead to the fact that different servers would serve a distinct resource, only depending on issuing a GET or POST, including different header fields.
No, it's not bad practice, partitioning the tasks by the nature of the task is perfectly fine, as long as you don't need to store persistent per-user-session data on the server.

HTTP Proxy/FastCGI/SCGI not closing connection when client disconnected - bug or feature?

I'm working on Comet support for CppCMS framework via long XMLHttpRequest polls. In many cases, such request is closed by client before any response from server was given -- for example the page is closed, user moves to other page or it is just refeshed.
At the server side I expect that I would recieve the notification that connection is dropped. I tested the application via 3 connectors: FastCGI, SCGI and simple HTTP Proxy.
From 3 major UNIX web servers, Apache2, lighttpd and Nginx, only the last one had closed
connection as expected allowing my application to remove the request from wait queue -- this worked for both FastCGI and HTTP Proxy connectors. (Nginx does not have scgi module by default).
Others, Apache and Lighttpd do not close connection or inform the backend about disconnected
clients, the proceed as if the client is still on line. This happens for all 3 supported APIs: FastCGI, SCGI and HTTP Proxy.
I had opened an issue for Lighttpd, but what
more conserns me is the fact that Apache -- mature and well supported web server as lighttpd
and does not discloses the server backend that client had gone.
Questions:
Is this a bug or this is a feature? Is there any reason not to close the connection between web server and application backend?
Are there real life Comet application working behind these servers via FastCGI/SCGI/HTTP-Proxy backends?
If the above true, how do they deal with this issue? I understand that I can timeout all connections every 10 seconds, but I would like to keep them idle as far as client listens -- because this allows easier scale up -- each connection is very cheep -- the cost is only the opended socket.
Thanks!
(1) Feature. Or, more specifically, fallout from an implementation detail.
A TCP/IP connection does not involve a constant flow of traffic back and forth. Thus, there is no way to know that a client is gone without (a) the client telling you it is closing the connection or (b) a timeout.
(2) I'm not specifically familiar with Comet or CppCMS. But, yes, there are all kinds of CMS servers running behind the mentioned web servers and they all have to deal with this issue (and, yes, it is a pain).
(3) Timeouts are the only way, but you can mitigate the pain, so to speak. Have the client ping the server across the connection every N seconds when there is otherwise no activity. Doesn't have to do anything and you can tack stuff on the reply; notifications of concurrent edits or whatever you need.
You are correct in that it is surprising that mod_fastcgi doesn't support telling the backend that Apache has detected the disconnect or the connection timed out. And you aren't the first to be dismayed.
The second patch on this page should fix that particular issue:
http://osdir.com/ml/web.fastcgi.devel/2006-02/msg00015.html
http://ncannasse.fr/blog/tora_comet
I don't have any concrete information for you, but this article does mention that they can detect when the client has disconnected from Apache. See tora.Queue. And it sounds like the source is available in the neko CVS, so you might be able to find some clues there. Good luck.

Resources