HTTP client acting as a pseudo-server - http

Let's say I am going to deploy a server application that's likely to be placed behind a NAT/firewall and I don't want to ask users to tweak their NAT port mapping. In other words, connections to the server are impossible, but my app is a server application by nature, i.e. it sends back objects per URI.
Now, I'm thinking about initiating connections from the server periodically to see what requests are there to be responded to. I'm going to use HTTP via port 80 as something that would likely be working through NAT/firewall from virtually anywhere.
The question is, are there any standard considerations and common practices of implementing a client that can act as a server at the application level, specifically using HTTP? Any special HTTP headers? Design patterns?
E.g. I am thinking about the following scheme:
The client (which is my logical server) sends a dummy HTTP request to the server
The server responds back with non-standard headers X-Request-URI:, X-Host:, X-If-Modified-Since: etc, in other words, request headers wrapped into X-xxx as they are not standard in this situation; also requests to keep the connection alive
The client responds with a POST request that sends the requested object; again, uses wrapped headers (e.g. X-Status:, etc)
Unless there is a more "standard" way of doing something like this, do you think my approach is plausible?
Edit: an interesting discussion took place on reddit here

I've done something similar. This is very common. Client initiate the connection to the Server and keep the connection ALIVE. If the session is shut-down, client would re-initiate. When the session is up, Server can push anything to the client since it's client initiated.

Related

What is the best way to redirect network requests?

I've written my own HTTP Server, but given certain criteria, I want to redirect some requests made to my server to another server running on the same machine. For example, I may want to redirect all requests to "/foo/*" to be handled by an apache server I also have running. What is the best way to do this?
The only way I can think of doing this is by running apache on a different port, and then making a completely new network request from my server to localhost:1234 (assuming apache is running on port 1234) with the same exact request headers and body, and then take the response and have my server send that back to the client.
That seems like a kind of hacky, roundabout way of accomplishing this though, and I'm sure this is a problem that is tackled by every major website. Is there a certain technology or protocol for doing this that I just haven't heard of?
Thanks a lot!
Edit: Just to be clear, the client should only make one network request for all this, rather than having my server return a 3xx response
HTTP runs over TCP. The Apache server can't just send the required response to a client who hasn't asked for it. The client has asked YOUR HTTP server for the data and so it must be the one to send a response. The client is probably behind a firewall and, as such, the Apache server can't even establish a TCP connection with it (incoming connections are usually blocked).
If your server takes the clients request, forwards it to the Apache server, gets the response from the Apache server and forwards it to the client, it's acting as a proxy server (a middleman). This won't be redirection.
The only sensible way to do this would be to have the client make two network requests.

HTTP and Sessions

I just went through the specification of http 1.1 at http://www.w3.org/Protocols/rfc2616/rfc2616.html and came across a section about connections http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8 that says
" A significant difference between HTTP/1.1 and earlier versions of HTTP is that persistent connections are the default behavior of any HTTP connection. That is, unless otherwise indicated, the client SHOULD assume that the server will maintain a persistent connection, even after error responses from the server.
Persistent connections provide a mechanism by which a client and a server can signal the close of a TCP connection. This signaling takes place using the Connection header field (section 14.10). Once a close has been signaled, the client MUST NOT send any more requests on that connection. "
Then I also went through a section on http state management at https://www.rfc-editor.org/rfc/rfc2965 that says in its section 2 that
"Currently, HTTP servers respond to each client request without relating that request to previous or subsequent requests;"
A section about the need to have persistent connections in the RFC 2616 also said that prior to persistent connections every time a client wished to fetch a url it had to establish a new TCP connection for each and every new request.
Now my question is, if we have persistent connections in http/1.1 then as mentioned above a client does not need to make a new connection for every new request. It can send multiple requests over the same connection. So if the server knows that every subsequent request is coming over the same connection, would it not be obvious that the request is from the same client? And hence would this just not suffice to maintain the state and would this just nit be enough for the server to understand that the request was from the same client ? In this case then why is a separate state management mechanism required at all ?
Basically, yes, it would make sense, but HTTP persistent connections are used to eliminate administrative TCP/IP overhead of connection handling (e.g. connect/disconnect/reconnect, etc.). It is not meant to say anything about the state of the data moving across the connection, which is what you're talking about.
No. For instance, there might an intermediate (such as a proxy or a reverse proxy) in the request path that aggregates requests from multiple TCP connections.
See http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p1-messaging-21.html#intermediaries.

stateless protocol and stateful protocol

How to understand stateless protocol and stateful protocol? HTTP is a stateless protocol and FTP is a stateful protocol. For the web applications requiring a lot of interactions, the underlying protocol should be stateful ones. Is my understanding right?
HTTP is a stateless protocol, in other word the server will forget everything related to client/browser state. Although web applications have made it virtually look like stateful.
A stateless protocol can be forced to behave as if it were stateful. This can be accomplished if the server sends the state to the client, and if the client to sends it back again to the server, every time.
There are three ways this may be accomplished in HTTP:
a) One is cookies, in which case the state is sent and returned in HTTP headers.
b) The second is URL extension, in which case the state is sent as part of the URL as request.
c) The third is "hidden form fields", in which the state is sent to the client as part of the response, and returned to the server as part of a form's hidden data
SCALABILITY AND HIGH AVAILABILITY
One of the major reasons why HTTP scales so well is its Statelessness. Stateless protocol eases the replication concerns, as the state itself doesn't need to be stored on the server.
Stateful protocols are logically heavy to implement in Internet reliably. Stateless servers are also easily scalable, while for stateful servers scalablity is problematic. Stateless request can be sent to any node, at any time, while with Stateful this is not a case.
HTTP as Stateless protocol increases availability for stateless web applications, which otherwise would be difficult or impossible to implement. If there is connection lost, there is no state that is lost, simple request resend will resolve the problem. Stateless requests are also cacheable.
see more here
Since you're asking about a Web application, the protocol will always be stateless -- the protocol for the Web is http (or https), and that's all she wrote.
I think what you're thinking of is providing a state mechanism in your Web application itself. The typical approach to this is that you create a unique identifier for the user's session in your Web application (a sessionID of one form or another is the common practice) which is handed back and forth between browser and server. That's typically done in a cookie, though it can be done, with a bit more hassle for you depending on your platform/framework, on the URL as well.
Your server-side code stores stateful information (again, typically called the user's session) however it wants to using the sessionID to look it up. The http traffic simply hands back the sessionID. As long as that identifier is there, each http transaction is completely independent of all others, hence the protocol traffic itself is stateless.
HTTP is a stateless protocol. All the web-based applications are also stateless.
When a request is send to the server, a connection is established between client and server. The server receives the request, processes the request and sends back the response and then, the connection will be closed.
If another request will be sent, after that, it will be treated as a new request and a new connection will be established.
In order to make HTTP stateful, we use session management techniques.
So that, it uses the data coming from previous request while processing present request i.e, it uses the same connection for a series of client server interactions.
The session management techniques are:
hidden form field
cookie
session
URL-rewriting
Anything that forgets whatever it did in past is stateless, such as http
Anything that can keep the history is statefull, such as database
Http is a stateless protocol, that's why it forgets the user information.
We make http as statefull protocol using jsonWebToken(JWT) i.e. on each request going to server, server will first verify the user using JWT.
Your question is spot on, and yes, it would be great if your web transactions with your bank were done over a stateful connection. Alas, HTTP is stateless due to a quirky bug in FTP and a 12 socket limit in the partial socket table in BSD of 1989. Marcus Ranum explained it all here.
So HTTP throws away the state it inherits from TCP and has to recreate state at the application layer in the form of cookies. Crappy internet security is the result.
The Seif project proposes to fix all that using "secure JSON over TCP". DNS and certificate authorities are not required. The protocol and seifnode.js are finished and on github with an MIT license.
HTTP doesn't 'inherit' from TCP, but rather uses it for a transport. HTTP uses TCP for a stateful connection, but then disconnects. Later it will connect again, if needed. So, while you browse through a web site you create many different connections. Each one of those connections is stateful, but the conversation as a whole is not, since you are dropping the connection with every conversation.
From this link
Basically yes, but you have no choice but use HTTP which is where websites are served in. So you have to deal with compromises to make HTTP stateful, aka session management. Possibilities are basically passing on a session id through each call in the URL so you know when you're talking to someone you've talked about before, or via cookies, which achieve the same goal without cluttering the url. However, most modern web development languages take care of that for you; if you google for the language of your choice + "session management" you should get some ideas of how it's done.

Can I reuse my existing TCP-Server?

At the moment I have an existing application which basically consists of a desktop GUI and a TCP server. The client connects to the server, and the server notifies the client if something interesting happens.
Now I'm supposed to replace the desktop GUI by a web GUI, and I'm wondering if I have to rewrite the server to send http packets instead of tcp packets or if I can somehow use some sort of proxy to grab the tcp packets and forward them to the web client?
Do I need some sort of comet server?
If you can make your client ask something like "Whats new pal?" to your server from time to time you can start implementing HTTP server emulator over TCP - its fun and easy process. And you could have any web based GUI.
You can just add to your TCP responds Http headers - itll probably do=)
So I mean HTTP is just a TCP with some headers like shown in here.
You should probably install fiddler and monitor some http requests/ responses you normally do on the web and you'll get how to turn your TCP server into http emulator=)
If you want keep sockets based approche use flash (there is some socket api) or silverlight (there is socket API and you can go for NetTcpBinding or Duplexbinding something like that - it would provide you with ability to receive messages from server when server wants you to receive them (server pushes messages))
So probably you should tall us which back end you plan to use so we could recomend to you something more usefull.

HTTP Proxy/FastCGI/SCGI not closing connection when client disconnected - bug or feature?

I'm working on Comet support for CppCMS framework via long XMLHttpRequest polls. In many cases, such request is closed by client before any response from server was given -- for example the page is closed, user moves to other page or it is just refeshed.
At the server side I expect that I would recieve the notification that connection is dropped. I tested the application via 3 connectors: FastCGI, SCGI and simple HTTP Proxy.
From 3 major UNIX web servers, Apache2, lighttpd and Nginx, only the last one had closed
connection as expected allowing my application to remove the request from wait queue -- this worked for both FastCGI and HTTP Proxy connectors. (Nginx does not have scgi module by default).
Others, Apache and Lighttpd do not close connection or inform the backend about disconnected
clients, the proceed as if the client is still on line. This happens for all 3 supported APIs: FastCGI, SCGI and HTTP Proxy.
I had opened an issue for Lighttpd, but what
more conserns me is the fact that Apache -- mature and well supported web server as lighttpd
and does not discloses the server backend that client had gone.
Questions:
Is this a bug or this is a feature? Is there any reason not to close the connection between web server and application backend?
Are there real life Comet application working behind these servers via FastCGI/SCGI/HTTP-Proxy backends?
If the above true, how do they deal with this issue? I understand that I can timeout all connections every 10 seconds, but I would like to keep them idle as far as client listens -- because this allows easier scale up -- each connection is very cheep -- the cost is only the opended socket.
Thanks!
(1) Feature. Or, more specifically, fallout from an implementation detail.
A TCP/IP connection does not involve a constant flow of traffic back and forth. Thus, there is no way to know that a client is gone without (a) the client telling you it is closing the connection or (b) a timeout.
(2) I'm not specifically familiar with Comet or CppCMS. But, yes, there are all kinds of CMS servers running behind the mentioned web servers and they all have to deal with this issue (and, yes, it is a pain).
(3) Timeouts are the only way, but you can mitigate the pain, so to speak. Have the client ping the server across the connection every N seconds when there is otherwise no activity. Doesn't have to do anything and you can tack stuff on the reply; notifications of concurrent edits or whatever you need.
You are correct in that it is surprising that mod_fastcgi doesn't support telling the backend that Apache has detected the disconnect or the connection timed out. And you aren't the first to be dismayed.
The second patch on this page should fix that particular issue:
http://osdir.com/ml/web.fastcgi.devel/2006-02/msg00015.html
http://ncannasse.fr/blog/tora_comet
I don't have any concrete information for you, but this article does mention that they can detect when the client has disconnected from Apache. See tora.Queue. And it sounds like the source is available in the neko CVS, so you might be able to find some clues there. Good luck.

Resources