From my understanding, HTTP/2 comes with a stateful header compression called HPACK. Doesn't it change the stateless semantics of the HTTP protocol? Is it safe for web applications to consider HTTP/2 as a stateless protocol? Finally, will HTTP/2 be compatible with the existing load balancers?
HTTP/2 is stateless.
Original HTTP is a stateless protocol, meaning that each request message can be understood in isolation. This means that every request needs to bring with it as much detail as the server needs to serve that request, without the server having to store a lot of info and meta-data from previous requests.
Since HTTP/2 doesn't change this paradigm, it has to work the same way, stateless.
It's clearly visible from official RFCs as well. It is stated:
The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, collaborative, hypermedia information systems. It is a generic, stateless, protocol which can be used for many tasks...
and the definition of HTTP/2 says:
This specification describes an optimized expression of the semantics of the Hypertext Transfer Protocol (HTTP), referred to as HTTP version 2 (HTTP/2)... This specification is an alternative to, but does not obsolete, the HTTP/1.1 message syntax. HTTP's existing semantics remain unchanged.
Conclusion
HTTP/2 protocol is stateless by design, as semantics remain unchanged in comparison to original HTTP.
From where confusion may come
An HTTP/2 connection is an application-layer protocol running on top of a TCP connection (BTW, nothing stops you to use HTTP over UDP for example, it's possible, but UDP is not used because it is not a "reliable transport"). Don't mix it with session and transport layers. HTTP protocol is stateless by design.
HTTP over an encrypted SSL/TLS connection, also changes nothing to this statement, as S in HTTPS is concerned with the transport, not the protocol itself.
HPACK, Header Compression for HTTP/2, is a compression format especially crafted for HTTP/2 headers, and it is being specified in a separate internet draft. It doesn't change HTTP/2 itself, so it doesn't change the semantics.
In RFC for HTTP/2 in section about HPACK they state:
Header compression is stateful. One compression context and one
decompression context are used for the entire connection.
And here's why from HPACK's RFC:
2.2. Encoding and Decoding Contexts
To decompress header blocks, a decoder only needs to maintain a
dynamic table (see Section 2.3.2) as a decoding context. No other
dynamic state is needed.
When used for bidirectional communication, such as in HTTP, the
encoding and decoding dynamic tables maintained by an endpoint are
completely independent, i.e., the request and response dynamic tables
are separate.
HPACK reduces the length of header field encoding by exploiting the
redundancy inherent in protocols like HTTP. The ultimate goal of
this is to reduce the amount of data that is required to send HTTP
requests or responses.
An HPACK implementation cannot be completely stateless, because the encoding and decoding tables, completely independent, have to be maintained by an endpoint.
At the same time, there are libraries, which try to solve HPACK issues, for example, a stateless event-driven HPACK codec CASHPACK:
An HPACK implementation cannot be completely stateless, because a dynamic table needs to be maintained. Relying on the assumption that HTTP/2 will always decode complete HPACK sequences, statelessness is achieved using an event-driven API.
Modern HTTP, including HTTP/2, is a stateful protocol. Old timey HTTP was stateless.
Many HTTP/2 components are the very definition of stateful.
No reasonable person can read the HTTP/2 RFC and think it is stateless. The errant "HTTP is stateless" old time dogma is false doesn't represent the current reality of HTTP.
Here's a limited, and not exhaustive list, of stateful HTTP/1 and HTTP/2 components:
Cookies, (named "HTTP State Management Mechanism" by the RFC)
HTTPS, which stores keys thus state
HTTP authentication requires state
Web Storage
HTTP caching is stateful
The very purpose of the stream identifier is state
Header blocks, which establish stream identifiers, are stateful.
Frames which reference stream identifiers are stateful
Header Compression, which the HTTP RFC explicitly says is stateful, is stateful.
Opportunistic encryption is stateful.
Section 5.1 of the HTTP/2 RFC is a great example of stateful mechanisms defined by the HTTP/2 standard.
Is it safe for web applications to consider HTTP/2 as a stateless protocol?
HTTP/2 is a stateful protocol, but that doesn't mean your HTTP/2 application can't be stateless. You can choose to not use certain stateful features for stateless HTTP/2 applications by using only a subset of HTTP/2 features.
Cookies and some other stateful mechanisms, or less obvious stateful mechanisms, are later HTTP additions. HTTP 1 is said to be stateless although in practice we use standardized stateful mechanisms. Unlike HTTP/1.0, HTTP/2 defines stateful components in its standard and is therefor stateful. A particular HTTP/2 application can use a subset of HTTP/2 features to maintain statelessness.
Existing applications, even HTTP 1 applications, needing state will break if trying to use them statelessly. It can be impossible to log into some HTTP/1.1 websites if cookies are disabled, thus breaking the application. It may not be safe to assume that a particular HTTP 1 application does not use state. This is no different for HTTP/2. Before Netscape invented cookies and HTTPS in 1994 http could be considered stateless.
Say it with me one last time:
HTTP/2 is a stateful protocol.
Related
What does operations are done with gRPC, over HTTP/2 means. I am interested in knowing how does gRPC and HTTP/2 play along.
gRPC is a protocol that uses HTTP/2. The messages you send are encoded as gRPC frames (5 byte header) and packaged into HTTP/2 DATA frames. The HTTP/2 HEADERS frames are used to propagate headers and trailers at the beginning and end of the call.
It would be possible to use gRPC over other protocols, though this is less common as of this writing. For example:
gRPC can be used In Process, meaning there is no wire encoding. You still get to use the same gRPC API and Stubs though. This is commonly used for testing
QUIC: This is a UDP based protocol that is an alternative to HTTP/2, but which has HTTP semantics. This is used on Android Java when using the AndroidChannelBuilder.
HTTP/1.1: This is used for gRPC Web. Some minor modifications are needed to the gRPC protocol, but it can work from regular web browsers which currently don't support certain parts of HTTP/2.
Unlike the original stateless HTTP/1, HTTP/2 has many stateful components.
These parts of HTTP/2 are stateful:
Header blocks refer to a stateful unsigned 31-bit integer called a stream identifier
Frames also reference the same stateful stream identifier
opportunistic encryption also depends on state since TLS is stateful
Are there any other parts of HTTP/2 that are stateful?
HTTP 2 adds many stateful components to the HTTP corpus.
Streams use a stateful unsigned 31-bit integer called a "stream
identifier".
Header blocks are used to statefully establish the stream identifier.
Frames are stateful.
Header compression is stateful.
Opportunistic encryption is stateful.
The following are stateful components carried from previous HTTP additions which are still stateful in HTTP 2.
Cookies are stateful.
HTTPS is stateful.
HTTP caching is stateful. (See also RFC 7234.)
HTTP defined authentication is stateful. (See also RFC 7235)
Web Sockets are stateful and use a stateful "Sec-WebSocket-Key" HTTP header.
Web Storage, although not HTTP, is also stateful.
HTTP,the protocol residing over TCP protocol is stateless and also the IP protocol is stateless
But how can we conclude that TCP is stateless or not?
You can't assume that any stacked protocol is stateful or stateless just looking at the other protocols on the stack. Stateful protocols can be built on top of stateless protocols and stateless protocols can be built on top of stateful protocols. One of the points of a layered network model is that the kind of relationship you're looking for (statefulness of any given protocol in function of the protocols it's used in conjunction with) does not exist.
The TCP protocol is a stateful protocol because of what it is, not because it is used over IP or because HTTP is built on top of it. TCP maintains state in the form of a window size (endpoints tell each other how much data they're ready to receive) and packet order (endpoints must confirm to each other when they receive a packet from the other). This state (how much bytes the other guy can receive, and whether or not he did receive the last packet) allows TCP to be reliable even over inherently non-reliable protocols. Therefore, TCP is a stateful protocol because it needs state to be useful.
I would also like to point out that while HTTP and HTTPS (which is just HTTP over SSL/TLS, really) are essentially stateless (each request is a valid standalone request per the protocol), applications built on top of HTTP and HTTPS aren't necessarily stateless. For instance, a website can require you to visit a login page before sending a message. Even though the request where the client sends a message is a valid standalone request, the application will not accept it unless the client authenticated herself before. This means that the application implements state over HTTP.
On a side note, the statefulness of HTTP can be somewhat confusing, as several applications (on a clearly different OSI layer) will leak their state to HTTP. For instance, if a user tries to view a blog post that doesn't exist, the blog application might send back a response with the 404 status code, even though the file handling the blog post search itself was found.
tl;dr TCP is stateful.
While Zneak points out that you can use any communication for stateful purposes, the ACTUAL question being asked is whether the protocol itself is stateful.
Wikipedia:
In computing, a stateless protocol is a communications protocol that
treats each request as an independent transaction that is unrelated to
any previous request so that the communication consists of independent
pairs of requests and responses. A stateless protocol does not require the server to retain
session information or status about each communications partner for
the duration of multiple requests. In contrast, a protocol which
requires keeping of the internal state on the server is known as a
stateful protocol.
TCP's "request" (unit of communication) is a TCP packet.
TCP a stateful protocol since parties must remember what state the other is in, and what bytes the other has. Hence the TCP state diagram.
In contrast, UDP is a stateless protocol. Neither endpoint retains any notion of state. (Though as always, the encapsulated information could be used for stateful purposes.)
Here is a nice explanation :
Consider the phone service to be TCP and consider your relationship with distant family members to be HTTP. You will contact them with the phone service. Each call to them would be a stateful TCP connection. However, you don't constantly stay on the phone with them, as you will disconnect and call them back again at a later time. You would certainly expect them to remember what you talked about on the last call. HTTP in itself does not do that, but it is rather a function of the web server that maintains the state of the overall converstation.
To properly answer the question, we need the concept of a stateless protocol used to manage external stateful resources. Section 2.4 of http://laurel.datsi.fi.upm.es/_media/docencia/asignaturas/ws-modelingresources.pdf is about a service that implements such a protocol:
A Service that acts upon stateful resources may be described
“stateless” if it delegates responsibility for the management of the
state to another component such as a database or file system. ... A
consequence of statelessness is that any dynamic state needed for a
given message-exchange execution must be:
provided explicitly within the request message, whether directly by-value or indirectly by-reference, and/or
maintained implicitly within other system components with which the Web service can interact.
So, the http protocol is stateless, if we consider that the files that are served, the database that is accessed, etc. are separated from the implementation of the protocol itself. A service (which implements a protocol) that is stateless in relation with both sides taken together might not appear stateless on each side, because the other side can carry a state.
There's too much elaboration about the HTTP protocol. But to its essensce, it's nothing but a string of ASCII characters transmitted over the TCP protocol. And the string defines the semantic of the protocol. Am I right on this?
If so, 2 questions follows:
Can we devise any protocols as we want, cause it just looks like
passing strings over the internet.
Why don't we compress the HTTP strings before we pass it down to the TCP level?
That's right, HTTP is by no means a special, but because it underpins the web it receives a lot of attention. It's an application level protocol like SMTP or FTP or any other.
Yes, you could design any protocol you like. For fun, grab an RFC for SMTP, FTP or HTTP and connect to your own server and learn the protocol. RFC2324 is also required reading - http://www.faqs.org/rfcs/rfc2324.html
Lack of HTTP header compression has been talked about a lot in recent years. See Steve Souders blog/books, YSlow! and Google Page Speed sites. The SPDY protocol is probably going to be the front runner at addressing several of the current issues with HTTP connection management, performance and security - http://www.chromium.org/spdy/spdy-whitepaper
Sure. But you would have to get others to adopt your protocol (unless it is an internal/proprietary spec). And if you can coherently express your communique in the form of HTTP, why not use it? It's widely implemented in virtually every language and operating system, and is well understood and easily debugged. Don't just create protocols for the heck of it.
The HTTP specification provides for several common compression schemes. gzip and deflate are particularly widely used. See, for example, Apache's mod_gzip and mod_deflate. Clients and servers routinely negotiate compression on your behalf.
I've been working with websockets lately in detail. Created my own server and there's a public demo. I don't have such detailed experience or knowledge re: http. (Although since websocket requests are upgraded http requests, I have some.)
On my end, the server reports details of each hit. Among them are a bunch of http keep-alive requests. My server doesn't handle them because they're not websocket requests. But it got my curiosity up.
The whole big thing about websockets is that the connection stays alive. Then you can pass messages in both directions (simultaneously even). I've read that the Keep-Alive HTTP connection is a relatively new development (I don't know how many years in people time, just that it's only included in the latest standard - 1.1 - is that actually old now?)
I guess I can assume that there's a behavioral difference between the two or there would have been no reason for a websocket standard? What's the difference?
A Keep Alive HTTP header since HTTP 1.0, which is used to indicate a HTTP client would like to maintain a persistent connection with HTTP server. The main objects is to eliminate the needs for opening TCP connection for each HTTP request. However, while there is a persistent connection open, the protocol for communication between client and server is still following the basic HTTP request/response pattern. In other word, server side can't push data to client.
WebSocket is completely different mechanism, which is used to setup a persistent, full-duplex connection. With this full-duplex connection, server side can push data to client and client should be expected to process data from server side at any time.
Quoting corresponding entries on Wikipedia for reference:
1) http://en.wikipedia.org/wiki/HTTP_persistent_connection
2) http://en.wikipedia.org/wiki/WebSocket
You should read up on COMET, a design pattern which shows the limits of HTTP Keep-Alive. Keep-Alive is over 12 years old now, so it's not a new feature of HTTP. The problem is that it's not sufficient; the client and server cannot communicate in a truly asynchronous manner. The client must always use a "hanging" request in order to get a message back from the server; the server may not just send a message to the client at any time it wants.
HTTP vs Websockets
REST (HTTP)
Resources benefit from caching when the representation of a resource changes rarely or multiple clients are expected to retrieve the resource.
HTTP methods have well-known idempotency and safety properties. A request is “idempotent” if it can be issued multiple times without resulting in unique outcomes.
The HTTP design allows for responses to describe errors with the request, with the resource, or to provide nuanced status information to differentiate between success scenarios.
Have request and response functionality.
HTTP v1.1 may allow multiple requests to reuse a single connection, there will generally be small timeout periods intended to control resource consumption.
You might be using HTTP incorrectly if…
Your design relies on a client polling the service often, without the user taking action.
Your design requires frequent service calls to send small messages.
The client needs to quickly react to a change to a resource, and it cannot predict when the change will occur.
The resulting design is cost-prohibitive. Ask yourself: Is a WebSocket solution substantially less effort to design, implement, test, and operate?
WebSockets
WebSocket design does not allow explicit or transparent proxies to cache messages, which can degrade client performance.
WebSocket protocol offers support only for error scenarios affecting the establishment of the connection. Once the connection is established and messages are exchanged, any additional error scenarios must be addressed in the messaging layer design, but WebSockets allow for a higher amount of efficiency compared to REST because they do not require the HTTP request/response overhead for each message sent and received.
When a client needs to react quickly to a change (especially one it cannot predict), a WebSocket may be best.
This makes the protocol well suited to “fire and forget” messaging scenarios and poorly suited for transactional requirements.
WebSockets were designed specifically for long-lived connection scenarios, they avoid the overhead of establishing connections and sending HTTP request/response headers, resulting in a significant performance boost
You might be using WebSockets incorrectly if..
The connection is used only for a very small number of events, or a very small amount of time, and the client does not - need to quickly react to the events.
Your feature requires multiple WebSockets to be open to the same service at once.
Your feature opens a WebSocket, sends messages, then closes it—then repeats the process later.
You’re re-implementing a request/response pattern within the messaging layer.
The resulting design is cost-prohibitive. Ask yourself: Is a HTTP solution substantially less effort to design, implement, test, and operate?
Ref: https://blogs.windows.com/buildingapps/2016/03/14/when-to-use-a-http-call-instead-of-a-websocket-or-http-2-0/