Why is HTTP protocol stateless if it can deal with persistent connections? - http

The HTTP protocol is stateless, but I found this on the Kurose-Ross book:
The default HTTP method is with persistent connections and pipeling.
This means that it can handle multiple requests, so it keeps opened the socket of a client that wants to ask multiple requests.Is that true? If yes, why is HTTP protocol considered stateless?

HTTP persistent connections relate to TCP connection being left open. HTTP operates on top of TCP - so TCP can be connected and/or stateful whereas HTTP would not. TCP is just the transport for HTTP.
If you look at the OSI model, you can see that TCP is on layer 4 (transport), whereas HTTP is on layer 7 (application). HTTP is not tied to TCP and could use other ways of transport too - as a protocol, it is not "inheriting" features from TCP.
(Note also that the persistent connection is not really persistent for a very long time. For Apache 2, it is open only for 5 seconds per default, and "According to RFC 2616 (page 46), a single-user client should not maintain more than 2 connections with any server or proxy".)

Related

HTTP protocol connectionless and use TCP by default, How does that make sense?

I read that HTTP protocol uses the reliable TCP connections by default
,and one of the HTTP feature is connectionless.
Now I am confused how does it make sense ?
How does it use TCP and the same time it is connectionless and as I know TCP is connection-oriented
HTTP and TCP are different things. TCP is a transport layer protocol, whereas HTTP is an application layer protocol. HTTP uses TCP for data transmissions.
IMO this website has a nice explanation:
HTTP is connectionless: The HTTP client, i.e., a browser initiates an HTTP request and after a request is made, the client waits for the response. The server processes the request and sends a response back after which client disconnect the connection. So client and server knows about each other during current request and response only. Further requests are made on new connection like client and server are new to each other.
However, Wikipedia defines HTTP as stateless:
HTTP is a stateless protocol. A stateless protocol does not require the HTTP server to retain information or status about each user for the duration of multiple requests. However, some web applications implement states or server side sessions using for instance HTTP cookies or hidden variables within web forms.
Based on their explanations, the terms seems to be used interchangeably. However, these are not really true as the in-use HTTP versions allow you to identify the users through cookies etc. and create persistent connections.

Persistence is at which layer?

I know that a http request first makes a 3 way handshake to establish connection. Followed by the request and response.
If a handshake is required for future requests then it is called non persistent connection.
The server can choose to keep the connection alive so that a handshake is not required untill a timeout value (persistent). This is called persistent connection. It saves time required by not requiring the 3 way handshake for each request.
My colleague mentions that http supports both persistent and non persistent. My understanding is that - tcp makes the connection. So persistence is controlled by tcp layer. Am I right?
May be not right. HTTP is higher layer than TCP and HTTP 1.0 will send close() when they finish tranportint some data streams. But in HTTP 1.1, the controller will not send close(), instead, it'll send keepalive/hearbeat to the other side for live. It is controlled by the application layer, in other words, by the HTTP itself.
One way HTTP can support a persistent connection is called Server-Sent-Events.
An alternative to HTTP for persistant connections is WebSocket. WebSocket is a computer communications protocol, providing full-duplex communication channels over a single TCP connection. WebSocket enables streams of messages on top of TCP. WebSocket is distinct from HTTP.

How Connection Disconnects in HTTP?

As we all know that the HTTP is stateless protocol, means it disconnects the connection after the end of every transaction.
But that's not enough for me to understand it. What makes me confuse that, Is the TCP connection also ends?
As HTTP is a TCP network protocol, so it talks to other nodes through the TCP pipe. So Is the stateless means that the TCP connection also ends?
So, it will make another TCP connection by using another TCP 3 way-handshake?
What makes the protocol stateless is that the server is not required to track state over multiple requests, not that it cannot do so if it wants to. This simplifies the contract between client and server, and in many cases minimizes the amount of data that needs to be transferred.and the Internet is a stateless development environment" is often used. This simply means that the HTTP that is the backbone of the Web is unable to retain a memory of the identity of each client that connects to a Web site and therefore treats each request for a Web page as a unique and independent connection, with no relationship whatsoever to the connections that preceded it.
As we all know that the HTTP is stateless protocol, means it disconnects the connection after the end of every transaction.
No, we don't all know that it means that at all: no, it doesn't mean that at all; and no, it doesn't do that at all.
What makes me confuse that, Is the TCP connection also ends?
The TCP connection ends when the HTTP connection ends.
As HTTP is a TCP network protocol, so it talks to other nodes through the TCP pipe.
Correct.
So Is the stateless means that the TCP connection also ends?
No. The HTTP and TCP connections can be persistent, and they are by default from HTTP 1.1 (previously via the so-called KEEPALIVE feature). This has nothing to do with statelessness.
So, it will make another TCP connection by using another TCP 3 way-handshake?
Yes, whenever required, but that isn't as often as you seem to think. You are conflating two different aspects of HTTP.

How does a browser know which response belongs to which request?

Suppose when we request a resource over HTTP, we get a response as shown below:
GET / HTTP/1.1
Host: www.google.co.in
HTTP/1.1 200 OK
Date: Thu, 20 Apr 2017 10:03:16 GMT
...
But when a browser requests many resources at a time, how can it identify which request got which response?
when a browser requests many resources at a time, how can it identify which request got which response?
A browser can open one or more connections to a web server in order to request resources. For each of those connections the rules regarding HTTP keep-alive are the same and apply to both HTTP 1.0 and 1.1:
If HTTP keep-alive is off, the request is sent by the client, the response is sent by the server, the connection is closed:
Connection 1: [Open][Request1][Response1][Close]
If HTTP keep-alive is on, one "persistent" connection can be reused for succeeding requests. The requests are still issued serially over the same connection, so:
Connection 1: [Open][Request1][Response1][Request3][Response3][Close]
Connection 2: [Open][Request2][Response2][Request4][Response4][Close]
With HTTP Pipelining, introduced with HTTP 1.1, if it is enabled (on most browsers it is by default disabled, because of buggy servers), browsers can issue requests after each other without waiting for the response, but the responses are still returned in the same order as they were requested.
This can happen simultaneously over multiple (persistent) connections:
Connection 1: [Open][Request1][Request2][Response1][Response2][Close]
Connection 2: [Open][Request3][Request4][Response3][Response4][Close]
Both approaches (keep-alive and pipelining) still utilize the default "request-response" mechanism of HTTP: each response will arrive in the order of the requests on that same connection. They also have the "head of line blocking" problem: if [Response1] is slow and/or big, it holds up all responses that follow on that connection.
Enter HTTP 2 multiplexing: What is the difference between HTTP/1.1 pipelining and HTTP/2 multiplexing?. Here, a response can be fragmented, allowing a single TCP connection to transmit fragments of different requests and responses intermingled:
Connection 1: [Open][Rq1][Rq2][Resp1P1][Resp2P1][Rep2P2][Resp1P2][Close]
It does this by giving each fragment an identifier to indicate to which request-response pair it belongs, so the receiver can recompose the message.
I think you are really asking for HTTP Pipelining here. This is a technique introduced in HTTP/1.1, through which all requests would be sent out by the client in order and be responded by the server in the very same order. All the gory details are now in RFC 7230, sec. 6.3.2.
HTTP/1.0 had (or has) a comparable method known as Keep Alive. This would allow a client to issue a new request right after the previous has been answered. The benefit of this approach is that client and server no longer need to negotiate through another TCP handshake for a new request/response cycle.
The important part is that in both methods the order of the responses matches the order of the issued requests over one connection. Therefore, responses can be uniquely mapped to the issuing requests by the order in which the client is receiving them: First response matches, first request, second response matches second request, … and so forth.
I think the answer you are looking for is TCP,
HTTP is a protocol that relies on TCP to establish connection between the Client and the Host
In HTTP/1.0 a different TCP connection is created for each request/response pair,
HTTP/1.1 introduced pipelining, wich allowed mutiple request/response pair, to reuse a single TCP connection, to boost performance (Didnt work very well)
So the request and the corresponding response are linked by the TCP connection they rely on,
It's then easy to associate a specific request with the response it produced,
PS: HTTP is not bound to use TCP forever, for example google is experimenting with other transport protocols like QUIC, that might end up being more efficient than TCP for the needs of HTTP
In addition to the explanations above consider a browser can open many parallel connections, usually up to 6 to the same server. For each connection it uses a different socket. For each request-response in each socket it is easy to determine the correlation.
In each TCP connection, request and response are sequential. A TCP connection can be re-used after finishing a request-response cycle.
With HTTP pipelining, a single connection can be multiplexed for
multiple overlapping requests.
In theory, there can be any number[*1] of simultaneous TCP connections, enabling parallel requests and responses.
In practice, the number of simultaneous connections is usually limited on the browser and often on the server as well.
[*1] The number of simultaneous connections is limited by the number of ephemeral TCP ports the browser can allocate on a given system. Depending on the operating system, ephemeral ports start at 1024 (RFC 6056), 49152 (IANA), or 32768 (some Linux versions).
So, that may allow up to 65,535 - 1023 = 64,512 TCP source ports for an application. A TCP socket connection is defined by its local port number, the local IP address, the remote port number and the remote IP address. Assuming the server uses a single IP address and port number, the limit is the number of local ports you can use.

HTTP: what are the relations between pipelining, keep-alive and Server Sent Events?

I am trying to understand what are the HTTP pipelining and HTTP keep-alive connections, and trying to establish a connection between these two topics and Server Sent events technology.
As far as I understand,
HTTP keep-alive connection is the default in HTTP 1.1 way of using TCP when the established once TCP connection is used for sending several HTTP requests one by one.
HTTP pipelining is the capability of client to send requests to server while responses to previous requests were not yet received using the same TCP connection, generally not used as a default way in browsers.
My questions:
1) if it is possible to send several requests to server one after one using one TCP connection - how the client can distinguish between the responses? I guess client is using FIFO order of sending responses by server?
2) Why non-idempotent requests such as POST requests shouldn't be pipelined (according to wikipedia)?
3) What about the limitations of the web-server: is the number of possible open TCP connections limited? If yes, then if some number of clients hold keep-alive connections others cannot establish connections, and this can result in a problem, right?
4) Server Sent Events are using the keep-alive connection but, as far as I understand, SSE are not using pipelining. Instead they manage to process several responses to one request, or may be they just send another request when the next response with event arrived. Which guess is correct?
5) One TCP connection means one socket? One socket means one TCP connection? Closing/opening socket means closing/opening TCP connection?
Yes, FIFO. TCP/IP guarantees delivering data in-order, so responses can't arrive in a different order (if the server/proxy is buggy and sends responses in wrong order then you're totally screwed).
I don't recall any reason per HTTP spec. It may be just caution, because pipelining is poorly implemented in some proxies/servers.
HTTP spec suggests 2 connections per server, browsers have settled on 6-8 connections per server, but there is no fixed limit. Running out of connections is a real problem for Apache, and for high-load situations it's recommended to disable KeepAlive in Apache and use a proxy (e.g. HAProxy) that can cheaply provide Keep-Alive functionality to clients.
The benefit of a proxy is that one proxy can distribute connections to several servers (helps scaling), or can modify the traffic (e.g. gzip compress everything even if server-side-software didn't).
SSE doesn't rely on Keep-Alive. It's not using multiple responses. It's a single response that takes forever to "download", so pipelining or keep-alive are irrelevant for SSE. The TCP/IP connection cannot return any more responses while SSE response is being sent.
SSE will keep the server busy as long as the connection is open (so typicall all the time for every user). That's why it's best to use SSE with Node.js/Tornado that can handle hundreds of thousands connections rather than PHP/Apache that is designed for few connections at a time.
Sockets are programming interface for TCP/IP connections. Generally yes, one socket is one connection.

Resources