Websocket frame-based - tcp

I read from here that Websocket is a frame-based protocol instead of stream-based. But it also states Why are WebSockets frame-based and not stream-based? I don’t know and just like you, I’d love to learn more, so if you have an idea, feel free to add comments and resources in the responses below.
Can anyone explain what is the advantage of using frame-based protocol in Websocket?

Maybe this pre-existing answer will help provide some reference for the discussion.
By utilizing frames and having a massage based based protocol (vs. a stream based protocol) it makes it easier to write Web oriented applications.
Common actions, such as sending JSON data, become easier and don't require every application to implement a network-data buffering/caching layer for fragmented messages.
EDIT (answering comment)
The TCP/IP layer guarantees network packet delivery and ordering, but has no concept of data length - it's a streaming protocol and it promises that the stream will arrive in order, no more than that.
If any data arrives out of order, than the TCP/IP protocol layer will re-order the data. This may require an internal cache/buffer that will hold on to the existing data while waiting for the missing data.
In contrast, the WebSocket is message based and is aware of the message data length.
The WebSocket frames use a header with data length (total / partial) to allow the WebSocket protocol layer to concat all the data as a single unit, even when it's distributed across multiple TCP/IP packets or (even) WebSocket frames.
This requires the protocol layer to keep the data in an internal buffer until all the expected data in the message arrives. The WebSocket protocol will forward the message to the application only when the whole of it's data arrives.
This "holding on to data" in order to extract message "units" from the stream is the caching/buffering element I was referring to.

Related

Do all protocols based on TCP use one socket per transfer?

I'm studying Socket Programming HOWTO and the author at some point says that
A protocol like HTTP uses a socket for only one transfer.
Is it because of the design of the HTTP protocol itself? Or is it because it is based on TCP, so all protocols based on it (e.g. UDP) must use one socket for only one transfer?
This statement is taken out of context. The context is to point out that TCP is not a message based protocol but an unstructured byte stream. And to have a message semantic one needs to have some way to determine where a message ends.
It then takes HTTP as an example where a message might simply end with a connection close and points out the limitations - namely only a single message per connection per direction. Then it goes on to describe how protocols can be designed without this limitation, i.e. having multiple messages per connection.
HTTP still can be used like this, i.e. have a single request and end with connection close. This is the design of HTTP version 0.9, but can still be done with HTTP/1. But with HTTP/1 it can also be used for multiple messages, one after the other. And with HTTP/2 it can do multiple messages in parallel, multiplexed over a single TCP connection. And HTTP/3 does not even use TCP anymore.
Do all protocols based on TCP use one socket per transfer?
Protocols are not limited to one connection ("socket") per message ("transfer"). Depending on the design of the protocol multiple messages can be send one after the other by having some pre-known message size or a clear message delimiter. Some protocols might send multiple messages in parallel by implementing a multiplexing layer on top of TCP. Some protocols might even use multiple TCP connections in parallel to deliver a single message, i.e. distributing the message over multiple connections.
That statement was probably written in 1996 or earlier. Since 1997, HTTP supports persistent connections, reusing the same TCP connection and the same socket for multiple queries.

What's the difference between WebSocket fragmentation and TCP fragmentation?

I'm reading about Websocket and I see that protocol have a data fragmentation (frames), a WebSocket message is composed of one or more frames, but it's not what TCP (fragmentation of data) do? I'm confused.
Fragmentation in the context of data transfer just means splitting the original data into smaller parts for transfer and combining these fragments later (for example at the recipients side) again to recreate the original data.
Fragmentation is often done if the underlying layer cannot handle larger messages or if larger messages will result in performance problems. Such problems might be because it is more expensive if one large message is lost and need to be repeated instead of only a small fragment. Or it can be a performance problem if the transfer of one large message would block the delivery of smaller messages. In this case it is useful to split the large message into fragments and deliver these message fragments together with the other messages so that these don't have to wait for delivery until the large message is done.
Fragmentation of messages in WebSockets is just one of the many types of fragmentation which exist at various layers at the data transport, like:
IP messages can be fragmented at the sender or some middlebox and get reassembled at the end.
TCP is a data stream. The various parts of the stream are transferred in different IP packets and get reassembled in the correct order at the recipient.
Application layer protocols like HTTP can have fragments too, for example the chunked Transfer-Encoding mode within HTTP or the fragments in WebSockets.
And at even higher layers there can be more fragments, like the spreading of a single large ZIP file into multiple parts onto floppy disks in former times or the accelerating of downloads by requesting different parts of the same file in parallel connections and combining these at the recipient.
I love the detailed answer by Steffen Ullrich, but I wish to add a few specific details regarding the differences between raw TCP/IP and the added Websockets layer.
TCP/IP is a stream protocol, meaning the application receives the data as fragmented pieces as data become available, with no clear indication of the fragmented "packet boundaries" or the original (non-fragmented) data structure.
The Websocket protocol is a message based protocol, meaning that the application will only receive the full Websocket message once all the fragmented pieces have arrived and put back together.
As a very simplified example:
TCP/IP: if a 50 Mb file is sent using TCP, the application will probably receive a piece of the file at a time and it will need to piece the file back together (possibly saving each piece to a temporary disk storage).
Websocket: if a 50 Mb file is sent using the Websocket protocol, the application will receive the whole of the 50Mb in one message (and the storage of all of the data, memory or disk, will be dictated by the Websocket layer, not the application layer).
Note that the Websocket Protocol is an additional layer over the TCP/IP protocol, so data is streamed over TCP/IP and the Websocket layer puts the pieces back together before forwarding the original (whole) message).
5.4. Fragmentation
A secondary use-case for fragmentation is for multiplexing, where it
is not desirable for a large message on one logical channel to
monopolize the output channel, so the multiplexing needs to be free
to split the message into smaller fragments to better share the
output channel. (Note that the multiplexing extension is not
described in this document.)
Even though it's listed as secondary reason, I'd say that't the primary reason for that fragmentation feature. Imagine, if you try to send first message with 1GB size and right away when you start sending it you also send second message with 1KB size. Framing allows applications to inject second message in between individual frames of the first message, this way receiver will not need to wait for 1GB to be transferred and will receive/handle 1KB second message right away.

HTTP vs TCP for online games

I am wondering about the difference between HTTP and TCP data transfer protocols for online games.
I have heard many people using TCP or UDP to transfer data between client and server for online games.
But can you use http at all? I know http is mostly used for web browsing, but if I could set up web server and let my game applications use GET and POST methods, I can still send data back and forth right? Is it that this way of communicating is too slow or unnecessary?
And just one thing about TCP transmission protocols, if I were to write some gaming application using TCP, is it that the data are usually transferred using something called "sockets" (like Socket classes in Java)? What about UDP?
Thanks very much!
Appreciate any answer!
HTTP is an additional layer on top of TCP that defines what a request looks like, what a response looks like, and how the connection is closed or maintained across requests. You can either use it or not use it, depending on what you actually need to transport. If your game consists of a series of requests that each get a reply, HTTP might make sense. If it's more like unsolicited messages in each direction, making HTTP work is like putting a square peg in a round hole.
Most platforms provide a socket interface that allows you to work with either TCP or UDP depending on the protocol specified when the socket is allocated. Some higher-level APIs look completely different for different protocols.

How do XMPP/HTML/etc. *really* work?

This might be a dumb question, however, I have been continually frustrated by what seems to be a big gap in every explanation I've seen of protocols like XMPP or HTML. So basically, when I've read documentation on either, in general, it will describe the structure of the data sent back and forth through the protocol, but it does not explain exactly how this data is transferred. It's one thing to provide an example of, say, a generic HTTP request, but it is something else to explain how this text is actually sent to the server.
I guess posed another way, what resources are there out there for learning best practices for implementing text-based protocols? At their core, are all text-based protocols basically the exact same thing? How, for example, would it differ at the binary level, were I to say send the text content of an HTTP request over IRC vs however it is done natively by HTTP?
If I wanted to develop my own, simple textual protocol, what would be the best way to send the text to a client? Does the content itself even really matter? What I mean is that, obviously, HTTP and XMPP are rather different protocols, but do they differ in terms of how the text is transferred between computer to computer?
HTTP, IRC and XMPP are all sent on top of TCP, which is a protocol that provides a bidirectional stream between two endpoints (IP address + port). Under the hood, the data you send is split into separate packets, sent across the network, and reassembled on the other end, so that the recipient just sees a stream of incoming data - except when something goes wrong; there is a somewhat accessible description here.
What that means is that while the application protocol (HTTP, XMPP etc) is different, the underlying transport mechanism is exactly the same. It would be possible (perhaps even interesting) to implement HTTP on top of IRC: an HTTP/IRC client enters a channel, sends the HTTP request as messages to the channel, line by line, a server is present in the channel, reads the request and sends the response the same way - but transporting HTTP over IRC is fundamentally different from transporting HTTP over TCP. The former means layering an application protocol over another application protocol (and the IRC connection needs to go over TCP anyway), while the latter is an application protocol over a transport protocol, which is the way things usually are done (except for various kinds of proxies).
Hope that makes some sense...

Is websocket a stream-based or package-based protocol?

Imagine that I have server and client talking via WebSocket. Each of time sends another chunks of data. Different chunks may have different length.
Am I guaranteed, that if server sends chunk in one call, then client will receive it in one message callback and vice-versa? I.e., does WebSocket have embedded 'packing' ability, so I don't have to care if my data is splitted among several callbacks during transmission or it doesn't?
Theoretically the WebSocket protocol presents a message based protocol. However, bear in mind that...
WebSocket messages consist of one or more frames.
A frame can be either a complete frame or a fragmented frame.
Messages themselves do not have any length indication built into the protocol, only frames do.
Frames can have a payload length of up to 9,223,372,036,854,775,807 bytes (due to the fact that the protocol allows for a 63bit length indicator).
The primary purpose of fragmentation is to allow sending a message that is of unknown size when the message is started without having to buffer that message.
So...
A single WebSocket "message" could consist of an unlimited number of 9,223,372,036,854,775,807 byte fragments.
This may make it difficult for an implementation to always deliver complete messages to you via its API...
So whilst, in the general case, the answer to your question is that the WebSocket protocol is a message based protocol and you don't have to manually frame your messages. The API that you're using to use the protocol may either have message size limits in place (to allow it to guarantee delivery of messages as a single chunk) or may present a streaming interface to allow for unlimited sized messages.
I ranted about this back during the standardisation process here.
WebSocket is a message-based protocol, so if you send a chunk of data as the payload of a WebSocket message, the peer will receive one separate WebSocket message with exactly that chunk of data as payload.

Resources