Unlike the original stateless HTTP/1, HTTP/2 has many stateful components.
These parts of HTTP/2 are stateful:
Header blocks refer to a stateful unsigned 31-bit integer called a stream identifier
Frames also reference the same stateful stream identifier
opportunistic encryption also depends on state since TLS is stateful
Are there any other parts of HTTP/2 that are stateful?
HTTP 2 adds many stateful components to the HTTP corpus.
Streams use a stateful unsigned 31-bit integer called a "stream
identifier".
Header blocks are used to statefully establish the stream identifier.
Frames are stateful.
Header compression is stateful.
Opportunistic encryption is stateful.
The following are stateful components carried from previous HTTP additions which are still stateful in HTTP 2.
Cookies are stateful.
HTTPS is stateful.
HTTP caching is stateful. (See also RFC 7234.)
HTTP defined authentication is stateful. (See also RFC 7235)
Web Sockets are stateful and use a stateful "Sec-WebSocket-Key" HTTP header.
Web Storage, although not HTTP, is also stateful.
Related
I was wondering how HTTP is stateless while its built over TCP which is stateful ?
I'm still begineer backend engineer and I dont have solid understanding of this topics.
I tried to search for explanations but I'm not sure if this question has been asked before.
There are transport layer (TCP) states and application layer (HTTP) states.
When talking about TCP being stateful one is talking about transport layer states. TCP is stateful because a transport layer state consisting of current sequence numbers etc is needed to provide the reliability guarantees of TCP, i.e. ordering of packets, removing of duplicates, acknowledgements and retransmission. Thus a state spanning over multiple "units" (packets) is needed.
In HTTP this unit is the HTTP message, i.e. the HTTP request from the client and the HTTP response from the server. When talking about HTTP being stateless it means that there is no state inside the HTTP protocol needed which spans multiple such messages: a response strictly follows a request and there is no state covering multiple requests or responses - all requests are independent from each other from the perspective of HTTP.
Within web applications itself though some state usually is needed, like for a user session. These states are implemented on top of HTTP, usually with cookies shared between the requests. These states are then independent from a specific HTTP request and also independent from the underlying TCP connection.
What does operations are done with gRPC, over HTTP/2 means. I am interested in knowing how does gRPC and HTTP/2 play along.
gRPC is a protocol that uses HTTP/2. The messages you send are encoded as gRPC frames (5 byte header) and packaged into HTTP/2 DATA frames. The HTTP/2 HEADERS frames are used to propagate headers and trailers at the beginning and end of the call.
It would be possible to use gRPC over other protocols, though this is less common as of this writing. For example:
gRPC can be used In Process, meaning there is no wire encoding. You still get to use the same gRPC API and Stubs though. This is commonly used for testing
QUIC: This is a UDP based protocol that is an alternative to HTTP/2, but which has HTTP semantics. This is used on Android Java when using the AndroidChannelBuilder.
HTTP/1.1: This is used for gRPC Web. Some minor modifications are needed to the gRPC protocol, but it can work from regular web browsers which currently don't support certain parts of HTTP/2.
From my understanding, HTTP/2 comes with a stateful header compression called HPACK. Doesn't it change the stateless semantics of the HTTP protocol? Is it safe for web applications to consider HTTP/2 as a stateless protocol? Finally, will HTTP/2 be compatible with the existing load balancers?
HTTP/2 is stateless.
Original HTTP is a stateless protocol, meaning that each request message can be understood in isolation. This means that every request needs to bring with it as much detail as the server needs to serve that request, without the server having to store a lot of info and meta-data from previous requests.
Since HTTP/2 doesn't change this paradigm, it has to work the same way, stateless.
It's clearly visible from official RFCs as well. It is stated:
The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, collaborative, hypermedia information systems. It is a generic, stateless, protocol which can be used for many tasks...
and the definition of HTTP/2 says:
This specification describes an optimized expression of the semantics of the Hypertext Transfer Protocol (HTTP), referred to as HTTP version 2 (HTTP/2)... This specification is an alternative to, but does not obsolete, the HTTP/1.1 message syntax. HTTP's existing semantics remain unchanged.
Conclusion
HTTP/2 protocol is stateless by design, as semantics remain unchanged in comparison to original HTTP.
From where confusion may come
An HTTP/2 connection is an application-layer protocol running on top of a TCP connection (BTW, nothing stops you to use HTTP over UDP for example, it's possible, but UDP is not used because it is not a "reliable transport"). Don't mix it with session and transport layers. HTTP protocol is stateless by design.
HTTP over an encrypted SSL/TLS connection, also changes nothing to this statement, as S in HTTPS is concerned with the transport, not the protocol itself.
HPACK, Header Compression for HTTP/2, is a compression format especially crafted for HTTP/2 headers, and it is being specified in a separate internet draft. It doesn't change HTTP/2 itself, so it doesn't change the semantics.
In RFC for HTTP/2 in section about HPACK they state:
Header compression is stateful. One compression context and one
decompression context are used for the entire connection.
And here's why from HPACK's RFC:
2.2. Encoding and Decoding Contexts
To decompress header blocks, a decoder only needs to maintain a
dynamic table (see Section 2.3.2) as a decoding context. No other
dynamic state is needed.
When used for bidirectional communication, such as in HTTP, the
encoding and decoding dynamic tables maintained by an endpoint are
completely independent, i.e., the request and response dynamic tables
are separate.
HPACK reduces the length of header field encoding by exploiting the
redundancy inherent in protocols like HTTP. The ultimate goal of
this is to reduce the amount of data that is required to send HTTP
requests or responses.
An HPACK implementation cannot be completely stateless, because the encoding and decoding tables, completely independent, have to be maintained by an endpoint.
At the same time, there are libraries, which try to solve HPACK issues, for example, a stateless event-driven HPACK codec CASHPACK:
An HPACK implementation cannot be completely stateless, because a dynamic table needs to be maintained. Relying on the assumption that HTTP/2 will always decode complete HPACK sequences, statelessness is achieved using an event-driven API.
Modern HTTP, including HTTP/2, is a stateful protocol. Old timey HTTP was stateless.
Many HTTP/2 components are the very definition of stateful.
No reasonable person can read the HTTP/2 RFC and think it is stateless. The errant "HTTP is stateless" old time dogma is false doesn't represent the current reality of HTTP.
Here's a limited, and not exhaustive list, of stateful HTTP/1 and HTTP/2 components:
Cookies, (named "HTTP State Management Mechanism" by the RFC)
HTTPS, which stores keys thus state
HTTP authentication requires state
Web Storage
HTTP caching is stateful
The very purpose of the stream identifier is state
Header blocks, which establish stream identifiers, are stateful.
Frames which reference stream identifiers are stateful
Header Compression, which the HTTP RFC explicitly says is stateful, is stateful.
Opportunistic encryption is stateful.
Section 5.1 of the HTTP/2 RFC is a great example of stateful mechanisms defined by the HTTP/2 standard.
Is it safe for web applications to consider HTTP/2 as a stateless protocol?
HTTP/2 is a stateful protocol, but that doesn't mean your HTTP/2 application can't be stateless. You can choose to not use certain stateful features for stateless HTTP/2 applications by using only a subset of HTTP/2 features.
Cookies and some other stateful mechanisms, or less obvious stateful mechanisms, are later HTTP additions. HTTP 1 is said to be stateless although in practice we use standardized stateful mechanisms. Unlike HTTP/1.0, HTTP/2 defines stateful components in its standard and is therefor stateful. A particular HTTP/2 application can use a subset of HTTP/2 features to maintain statelessness.
Existing applications, even HTTP 1 applications, needing state will break if trying to use them statelessly. It can be impossible to log into some HTTP/1.1 websites if cookies are disabled, thus breaking the application. It may not be safe to assume that a particular HTTP 1 application does not use state. This is no different for HTTP/2. Before Netscape invented cookies and HTTPS in 1994 http could be considered stateless.
Say it with me one last time:
HTTP/2 is a stateful protocol.
I have read some pieces of HTTP/2 rfc7540 specification and I'm not fully understood what is the difference between WINDOW_UPDATE and SETTINGS frames in HTTP/2 protocol?
Like said in the referenced RFC 7540, WINDOW_UPDATE is used to implement flow control, while SETTINGS transports configuration parameters that an endpoint must apply.
A client establishes a TCP connection to the server. It can then send a SETTINGS frame to inform the server of the configuration parameters that it wants the server to honor.
For example, the client endpoint can tell the server that is does not support PUSH (see https://www.rfc-editor.org/rfc/rfc7540#section-6.5.2).
Likewise, the server can send to the client a SETTINGS frame containing its configuration parameters.
Flow control, on the other hand, is about how many data bytes each endpoint can send on the connection.
The only frame that is subject to flow control is the DATA frame.
Flow control is a necessary mechanism that multiplexed protocols should implement. Refer to this section for further details.
In summary, the WINDOW_UPDATE and SETTINGS frames implement each a different functionality of the HTTP/2 protocol.
HTTP,the protocol residing over TCP protocol is stateless and also the IP protocol is stateless
But how can we conclude that TCP is stateless or not?
You can't assume that any stacked protocol is stateful or stateless just looking at the other protocols on the stack. Stateful protocols can be built on top of stateless protocols and stateless protocols can be built on top of stateful protocols. One of the points of a layered network model is that the kind of relationship you're looking for (statefulness of any given protocol in function of the protocols it's used in conjunction with) does not exist.
The TCP protocol is a stateful protocol because of what it is, not because it is used over IP or because HTTP is built on top of it. TCP maintains state in the form of a window size (endpoints tell each other how much data they're ready to receive) and packet order (endpoints must confirm to each other when they receive a packet from the other). This state (how much bytes the other guy can receive, and whether or not he did receive the last packet) allows TCP to be reliable even over inherently non-reliable protocols. Therefore, TCP is a stateful protocol because it needs state to be useful.
I would also like to point out that while HTTP and HTTPS (which is just HTTP over SSL/TLS, really) are essentially stateless (each request is a valid standalone request per the protocol), applications built on top of HTTP and HTTPS aren't necessarily stateless. For instance, a website can require you to visit a login page before sending a message. Even though the request where the client sends a message is a valid standalone request, the application will not accept it unless the client authenticated herself before. This means that the application implements state over HTTP.
On a side note, the statefulness of HTTP can be somewhat confusing, as several applications (on a clearly different OSI layer) will leak their state to HTTP. For instance, if a user tries to view a blog post that doesn't exist, the blog application might send back a response with the 404 status code, even though the file handling the blog post search itself was found.
tl;dr TCP is stateful.
While Zneak points out that you can use any communication for stateful purposes, the ACTUAL question being asked is whether the protocol itself is stateful.
Wikipedia:
In computing, a stateless protocol is a communications protocol that
treats each request as an independent transaction that is unrelated to
any previous request so that the communication consists of independent
pairs of requests and responses. A stateless protocol does not require the server to retain
session information or status about each communications partner for
the duration of multiple requests. In contrast, a protocol which
requires keeping of the internal state on the server is known as a
stateful protocol.
TCP's "request" (unit of communication) is a TCP packet.
TCP a stateful protocol since parties must remember what state the other is in, and what bytes the other has. Hence the TCP state diagram.
In contrast, UDP is a stateless protocol. Neither endpoint retains any notion of state. (Though as always, the encapsulated information could be used for stateful purposes.)
Here is a nice explanation :
Consider the phone service to be TCP and consider your relationship with distant family members to be HTTP. You will contact them with the phone service. Each call to them would be a stateful TCP connection. However, you don't constantly stay on the phone with them, as you will disconnect and call them back again at a later time. You would certainly expect them to remember what you talked about on the last call. HTTP in itself does not do that, but it is rather a function of the web server that maintains the state of the overall converstation.
To properly answer the question, we need the concept of a stateless protocol used to manage external stateful resources. Section 2.4 of http://laurel.datsi.fi.upm.es/_media/docencia/asignaturas/ws-modelingresources.pdf is about a service that implements such a protocol:
A Service that acts upon stateful resources may be described
“stateless” if it delegates responsibility for the management of the
state to another component such as a database or file system. ... A
consequence of statelessness is that any dynamic state needed for a
given message-exchange execution must be:
provided explicitly within the request message, whether directly by-value or indirectly by-reference, and/or
maintained implicitly within other system components with which the Web service can interact.
So, the http protocol is stateless, if we consider that the files that are served, the database that is accessed, etc. are separated from the implementation of the protocol itself. A service (which implements a protocol) that is stateless in relation with both sides taken together might not appear stateless on each side, because the other side can carry a state.