JSON/XML-RPC over TCP and Message Framing - tcp

I was thinking about JSON/XML-RPC over HTTP vs TCP. In case of HTTP, the HTTP request and response provide a message framing mechanism. However, since TCP is a stream,
How are RPC messages framed?
Does the RPC spec dictate that?
Are there any other standards defining framing mechanisms?
Is there one mechanism more often used than the other?
I'm trying to guage this before inventing a framing mechanism.

There are many framing standards, HTTP being one of them. Websocket being another, on top of HTTP, more suited for bidirectional streams.
JSON-RPC 2.0 does not concern itself with transport, intentionally.
(1.0 had some transport specifics, that was removed in the new version.)

RFC 7484 provides a framing standard for "JSON text sequences": https://www.rfc-editor.org/rfc/rfc7464
Summary: Each JSON message is prefixed with a 0x1E byte (which can't appear unescaped in a JSON message) and is suffixed with 0x0A (linefeed).
Note, however, that this is not part of the JSON-RPC specification. There are libraries that support it as part of their JSON-RPC implementation.

Related

What does gRPC, over HTTP/2 means?

What does operations are done with gRPC, over HTTP/2 means. I am interested in knowing how does gRPC and HTTP/2 play along.
gRPC is a protocol that uses HTTP/2. The messages you send are encoded as gRPC frames (5 byte header) and packaged into HTTP/2 DATA frames. The HTTP/2 HEADERS frames are used to propagate headers and trailers at the beginning and end of the call.
It would be possible to use gRPC over other protocols, though this is less common as of this writing. For example:
gRPC can be used In Process, meaning there is no wire encoding. You still get to use the same gRPC API and Stubs though. This is commonly used for testing
QUIC: This is a UDP based protocol that is an alternative to HTTP/2, but which has HTTP semantics. This is used on Android Java when using the AndroidChannelBuilder.
HTTP/1.1: This is used for gRPC Web. Some minor modifications are needed to the gRPC protocol, but it can work from regular web browsers which currently don't support certain parts of HTTP/2.

Is HTTP/2 a stateless protocol?

From my understanding, HTTP/2 comes with a stateful header compression called HPACK. Doesn't it change the stateless semantics of the HTTP protocol? Is it safe for web applications to consider HTTP/2 as a stateless protocol? Finally, will HTTP/2 be compatible with the existing load balancers?
HTTP/2 is stateless.
Original HTTP is a stateless protocol, meaning that each request message can be understood in isolation. This means that every request needs to bring with it as much detail as the server needs to serve that request, without the server having to store a lot of info and meta-data from previous requests.
Since HTTP/2 doesn't change this paradigm, it has to work the same way, stateless.
It's clearly visible from official RFCs as well. It is stated:
The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, collaborative, hypermedia information systems. It is a generic, stateless, protocol which can be used for many tasks...
and the definition of HTTP/2 says:
This specification describes an optimized expression of the semantics of the Hypertext Transfer Protocol (HTTP), referred to as HTTP version 2 (HTTP/2)... This specification is an alternative to, but does not obsolete, the HTTP/1.1 message syntax. HTTP's existing semantics remain unchanged.
Conclusion
HTTP/2 protocol is stateless by design, as semantics remain unchanged in comparison to original HTTP.
From where confusion may come
An HTTP/2 connection is an application-layer protocol running on top of a TCP connection (BTW, nothing stops you to use HTTP over UDP for example, it's possible, but UDP is not used because it is not a "reliable transport"). Don't mix it with session and transport layers. HTTP protocol is stateless by design.
HTTP over an encrypted SSL/TLS connection, also changes nothing to this statement, as S in HTTPS is concerned with the transport, not the protocol itself.
HPACK, Header Compression for HTTP/2, is a compression format especially crafted for HTTP/2 headers, and it is being specified in a separate internet draft. It doesn't change HTTP/2 itself, so it doesn't change the semantics.
In RFC for HTTP/2 in section about HPACK they state:
Header compression is stateful. One compression context and one
decompression context are used for the entire connection.
And here's why from HPACK's RFC:
2.2. Encoding and Decoding Contexts
To decompress header blocks, a decoder only needs to maintain a
dynamic table (see Section 2.3.2) as a decoding context. No other
dynamic state is needed.
When used for bidirectional communication, such as in HTTP, the
encoding and decoding dynamic tables maintained by an endpoint are
completely independent, i.e., the request and response dynamic tables
are separate.
HPACK reduces the length of header field encoding by exploiting the
redundancy inherent in protocols like HTTP. The ultimate goal of
this is to reduce the amount of data that is required to send HTTP
requests or responses.
An HPACK implementation cannot be completely stateless, because the encoding and decoding tables, completely independent, have to be maintained by an endpoint.
At the same time, there are libraries, which try to solve HPACK issues, for example, a stateless event-driven HPACK codec CASHPACK:
An HPACK implementation cannot be completely stateless, because a dynamic table needs to be maintained. Relying on the assumption that HTTP/2 will always decode complete HPACK sequences, statelessness is achieved using an event-driven API.
Modern HTTP, including HTTP/2, is a stateful protocol. Old timey HTTP was stateless.
Many HTTP/2 components are the very definition of stateful.
No reasonable person can read the HTTP/2 RFC and think it is stateless. The errant "HTTP is stateless" old time dogma is false doesn't represent the current reality of HTTP.
Here's a limited, and not exhaustive list, of stateful HTTP/1 and HTTP/2 components:
Cookies, (named "HTTP State Management Mechanism" by the RFC)
HTTPS, which stores keys thus state
HTTP authentication requires state
Web Storage
HTTP caching is stateful
The very purpose of the stream identifier is state
Header blocks, which establish stream identifiers, are stateful.
Frames which reference stream identifiers are stateful
Header Compression, which the HTTP RFC explicitly says is stateful, is stateful.
Opportunistic encryption is stateful.
Section 5.1 of the HTTP/2 RFC is a great example of stateful mechanisms defined by the HTTP/2 standard.
Is it safe for web applications to consider HTTP/2 as a stateless protocol?
HTTP/2 is a stateful protocol, but that doesn't mean your HTTP/2 application can't be stateless. You can choose to not use certain stateful features for stateless HTTP/2 applications by using only a subset of HTTP/2 features.
Cookies and some other stateful mechanisms, or less obvious stateful mechanisms, are later HTTP additions. HTTP 1 is said to be stateless although in practice we use standardized stateful mechanisms. Unlike HTTP/1.0, HTTP/2 defines stateful components in its standard and is therefor stateful. A particular HTTP/2 application can use a subset of HTTP/2 features to maintain statelessness.
Existing applications, even HTTP 1 applications, needing state will break if trying to use them statelessly. It can be impossible to log into some HTTP/1.1 websites if cookies are disabled, thus breaking the application. It may not be safe to assume that a particular HTTP 1 application does not use state. This is no different for HTTP/2. Before Netscape invented cookies and HTTPS in 1994 http could be considered stateless.
Say it with me one last time:
HTTP/2 is a stateful protocol.

How do XMPP/HTML/etc. *really* work?

This might be a dumb question, however, I have been continually frustrated by what seems to be a big gap in every explanation I've seen of protocols like XMPP or HTML. So basically, when I've read documentation on either, in general, it will describe the structure of the data sent back and forth through the protocol, but it does not explain exactly how this data is transferred. It's one thing to provide an example of, say, a generic HTTP request, but it is something else to explain how this text is actually sent to the server.
I guess posed another way, what resources are there out there for learning best practices for implementing text-based protocols? At their core, are all text-based protocols basically the exact same thing? How, for example, would it differ at the binary level, were I to say send the text content of an HTTP request over IRC vs however it is done natively by HTTP?
If I wanted to develop my own, simple textual protocol, what would be the best way to send the text to a client? Does the content itself even really matter? What I mean is that, obviously, HTTP and XMPP are rather different protocols, but do they differ in terms of how the text is transferred between computer to computer?
HTTP, IRC and XMPP are all sent on top of TCP, which is a protocol that provides a bidirectional stream between two endpoints (IP address + port). Under the hood, the data you send is split into separate packets, sent across the network, and reassembled on the other end, so that the recipient just sees a stream of incoming data - except when something goes wrong; there is a somewhat accessible description here.
What that means is that while the application protocol (HTTP, XMPP etc) is different, the underlying transport mechanism is exactly the same. It would be possible (perhaps even interesting) to implement HTTP on top of IRC: an HTTP/IRC client enters a channel, sends the HTTP request as messages to the channel, line by line, a server is present in the channel, reads the request and sends the response the same way - but transporting HTTP over IRC is fundamentally different from transporting HTTP over TCP. The former means layering an application protocol over another application protocol (and the IRC connection needs to go over TCP anyway), while the latter is an application protocol over a transport protocol, which is the way things usually are done (except for various kinds of proxies).
Hope that makes some sense...

validate SIP (Session Initiation Protocol)

What are the mechanisms/approaches to validate the UDP payload in case of SIP? SIP message doesn't contain size of header or the body, so how to verify that the payload is valid? In contrast, RTP indicates the size, so given the length value from UDP header it is possible to check RTP for validit and integrity. Can something similar be done for SIP?
Mark.
Your question has two parts:
How do I validate SIP headers? The only way to validate SIP headers is to parse them according to the rules of section 7.3.1 of RFC 3261. There are SIP parsers available for many different languages.
How do I validate the body of SIP messages? There is a mechanism built into SIP: the Content-Length header specifies the size of the body. In the general case, the body can contain an arbitrary MIME type, and no further validation rules apply.
Edit: Per Frank Shearar's comment below, Content-Length is not required for SIP messages conveyed via UDP. But if your UA supports it, you can take advantage of it.

When to use TCP and HTTP in node.js?

Stupid question, but just making sure here:
When should I use TCP over HTTP? Are there any examples where one is better than the other?
TCP is full-duplex 2-way communication. HTTP uses request/response model. Let's see if you are writing a chat or messaging application. TCP will work much better because you can notify the client immediately. While with HTTP, you have to do some tricks like long-polling.
However, TCP is just byte stream. You have to find another protocol over it to define your messages. You can use Google's ProtoBuffer for that.
Use HTTP if you need the services it provides -- e.g., message framing, caching, redirection, content metadata, partial responses, content negotiation -- as well as a large number of well-understood tools, implementations, documentation, etc.
Use TCP if you can't work within those constraints. However, if you use TCP you'll be creating a new application protocol, which has a number of pitfalls.

Resources