I just went through the specification of http 1.1 at http://www.w3.org/Protocols/rfc2616/rfc2616.html and came across a section about connections http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8 that says
" A significant difference between HTTP/1.1 and earlier versions of HTTP is that persistent connections are the default behavior of any HTTP connection. That is, unless otherwise indicated, the client SHOULD assume that the server will maintain a persistent connection, even after error responses from the server.
Persistent connections provide a mechanism by which a client and a server can signal the close of a TCP connection. This signaling takes place using the Connection header field (section 14.10). Once a close has been signaled, the client MUST NOT send any more requests on that connection. "
Then I also went through a section on http state management at https://www.rfc-editor.org/rfc/rfc2965 that says in its section 2 that
"Currently, HTTP servers respond to each client request without relating that request to previous or subsequent requests;"
A section about the need to have persistent connections in the RFC 2616 also said that prior to persistent connections every time a client wished to fetch a url it had to establish a new TCP connection for each and every new request.
Now my question is, if we have persistent connections in http/1.1 then as mentioned above a client does not need to make a new connection for every new request. It can send multiple requests over the same connection. So if the server knows that every subsequent request is coming over the same connection, would it not be obvious that the request is from the same client? And hence would this just not suffice to maintain the state and would this just nit be enough for the server to understand that the request was from the same client ? In this case then why is a separate state management mechanism required at all ?
Basically, yes, it would make sense, but HTTP persistent connections are used to eliminate administrative TCP/IP overhead of connection handling (e.g. connect/disconnect/reconnect, etc.). It is not meant to say anything about the state of the data moving across the connection, which is what you're talking about.
No. For instance, there might an intermediate (such as a proxy or a reverse proxy) in the request path that aggregates requests from multiple TCP connections.
See http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p1-messaging-21.html#intermediaries.
Related
While I am configuring my nginx, I found two modules: ngx_http_limit_conn_module and ngx_http_limit_req_module
one is for limiting connection per defined key, and one for limiting request.
My question is what is the relationship (and difference) between
a HTTP connection and a request.
It seems that multiple HTTP requests can use one common HTTP connection, what is the principle under this?
Basically connections are established to make requests using it. So for instance endpoint for given key may accept 5 connections per hour from given IP address. But it doesn't mean only 5 requests can be made but much more - if the connection is not closed after a request (from HTTP 1.1 it's by default kept alive).
E.g. an endpoint accepts 5 connections and 10 requests from given IP address. If connection is established for every request only 5 requests overall can be made. If connection is kept alive single client may make all the requests. If there are 5 clients, every establishes a connection and keeps it alive there are 2 request approx. that can be made by each client - however one can make all the request if it's fast enough.
HTTP connections - client and server introduce themselves.
HTTP requests - client ask something from server.
Making a connection with server involves TCP handshaking and it is basically creating a socket connection with the server. To make a HTTP request you should be already established a connection with the server. If you established connection with a server you can make multiple request using the same connection(HTTP/1.0 by default one request per connection, HTTP/1.1 by default it is keep alive). As most of the web pages need multiple resources from the server(ex: 100 photos to load in the screen). It is a low burden to the server if we keep the connection and request those 100 images using the same connection(No need to go through the connection establishment process 100 times). That is why HTTP/1.0 came up with keep alive as default.
A request is a functional execution: "Do something for me, and return the result back to me" - which is made by the client over a channel that the server is listening on, the "connection". Think of it as making a phone call to a restaurant. When the restaurant picks up the phone, you have an established "connection" - and now can place multiple requests over the same connection. The restaurant can handle multiple, simultaneous customer calls, if it has multiple phone lines open to receive the calls. This is your "connection pool" - at any point in time, you can only have as many simultaneous open connections (max) as the size of your connection pool. The number of requests however will vary. Some client may make 3 requests, and hang up, while other client may make 10 requests before hanging up.
The size of your connection pool determines concurrency - how many simultaneous clients can you talk to at any point in time? The length of those conversations will be use case specific.
I've read some articles comparing the differences between WebSocket and the other push methods like Long polling. All the conclusions tend to be WebSocket is better then HTTP with low latency in the server and client bidirectional communication process.
But if server push is not a must, for example, a client game program just make a few queries to the server for some information, does it still better to use WebSocket then HTTP? More specially, I have two doubts here:
1. In a single Request-Response procedure, which is more efficency ? (I establish a WebSocket connection each time querying in the above case.)
2. Will the server capacity (The total number of clients that the server can serve) be affected by the unnecessary long-lived connection if I keep an WebSocket connection during the life cycle of the client?
Added Question:
3. Suppose there is only one TCP connection between the server and the client, will the stability of the connection go down and down as time flows?
The basic thing behind both the WebSocket and HTTP is the socket. In HTTP, it opens a connection on request and closes on response. For WebSocket, concept is a 2 way communication (full duplex) rather than request-response cycle.
Answers to your question:
Either you can use HTTP server or can create request-response design
using WebSocket
That's obvious. Each connection is a socket object. Server capacity
will be affected if we are not managing connections.
In WebSocket, it's using ping-pong mechanism to make sure that the client or
the server is alive. For every ping requests from one end, other end is
subjected to reply a pong response. This mechanism helps to detect failures and hence to maintain stability.
If HTTP persistent connection is kept alive and done on the same socket with out dropping a socket or creating a new one for next HTTP connection. Then how come that HTTP is stateless and each HTTP request is on its own when they share the same socket?
Please correct me if my assumptions are wrong.
Thanks.
HTTP is considered stateless because the browser sends all the information the server works (cookies, referrer, etc) with in the HTTP Request Headers.
While there might a database involved which does store state, HTTP is stateless, because it doesn't store anything. And even if the socket is kept open, as long as it doesn't store anything it is still considered stateless.
I am writing a HTTP proxy server and I noticed that many clients use the "Connection: Keep-Alive" header to keep a persistent connection. Is it possible that the client sends another HTTP request before the server processes the first?
For example, the client sends "GET / HTTP/1.1" but before the server has a chance to respond, the client sends "GET /favicon.ico HTTP/1.1". Is that possible? Or will the client pause for the response before sending the second request?
Also, when using a persistent connection, is it safe to assume all requests through that connection will have the same "Host: " header?
"Also, when using a persistent connection, is it safe to assume all requests through that connection will have the same "Host: " header?"
I don't think so, see HTTPbis P1, Section 2.2:
Recipients MUST consider every message in a connection in isolation; because HTTP is a stateless protocol, it cannot be assumed that two requests on the same connection are from the same client or share any other common attributes. In particular, intermediaries might mix requests from different clients into a single server connection. Note that some existing HTTP extensions (e.g., [RFC4559]) violate this requirement, thereby potentially causing interoperability and security problems.
Yes, it is possible for the client to pipeline requests. (See http://en.wikipedia.org/wiki/HTTP_pipelining).
Turning your last question around... it would not be safe for a client to assume that requests to multiple hosts would be served by a single pipeline. There may be no specs that directly address your question on the Host: header, but it's a safe bet they'll be the same.
Regarding the first question:
Is it possible that the client sends another HTTP request before the server processes the first?
I believe that yes, it can be possible (perhaps I am wrong, I remembered having read that a couple of years ago; the definitive answer is in the HTTP protocol specifications). But I don't understand why you are asking. Also, the client can open several TCP connections at once to the same HTTP server. And of course you have many simultaneous clients.
About the second question
Also, when using a persistent connection, is it safe to assume all requests through that connection will have the same "Host: " header?
I believe it is usually the case, but I won't assume that to be certain. I could imagine that some clever HTTP clients, recognizing that two URL with different Host: headers share the same IP, could re-use the same connection.
But I don't understand why you are asking. Persistent HTTP connections have been invented to minimize the TCP connections which are costly, and the two questions you are asking are an extreme point on that. Perhaps few HTTP clients are doing what you describe today.
And you should be strict on what you send (w.r.t. standard conformance), but flexible on what you accept receiving.
I am reading that Keep-Alives is meant for performance - so that no connections need to be recreated but just reuse the existing ones. What if there is a traffic spike, will new connections be created?
Additionally, if I don't turn on Keep-Alive and in a high traffic environment, will it eventually running out of connections/socket port on client side? because a new connection has to be created for each http/web request.
HTTP is a stateless protocol.
In HTTP 1.0 each request meant opening a new TCP connection.
That caused performance issues (e.g. have to re-do the 3-way handshake for each GET or POST) so the Keep-Alive Header was added to maintain the connection across requests and in HTTP1.1 the default is persistent connection.
This means that the connection is reused across requests.
I am not really familiar with IIS but if there is a configuration to close the connection after each HTTP response, it will have impact on the performance.
Concerning the running out of sockets/ports on the client side, that could occur if the client fires a huge amount of requests and a new TCP connection must be opened per HTTP request.
After a while the ports will be depleted