Imagine that I have server and client talking via WebSocket. Each of time sends another chunks of data. Different chunks may have different length.
Am I guaranteed, that if server sends chunk in one call, then client will receive it in one message callback and vice-versa? I.e., does WebSocket have embedded 'packing' ability, so I don't have to care if my data is splitted among several callbacks during transmission or it doesn't?
Theoretically the WebSocket protocol presents a message based protocol. However, bear in mind that...
WebSocket messages consist of one or more frames.
A frame can be either a complete frame or a fragmented frame.
Messages themselves do not have any length indication built into the protocol, only frames do.
Frames can have a payload length of up to 9,223,372,036,854,775,807 bytes (due to the fact that the protocol allows for a 63bit length indicator).
The primary purpose of fragmentation is to allow sending a message that is of unknown size when the message is started without having to buffer that message.
So...
A single WebSocket "message" could consist of an unlimited number of 9,223,372,036,854,775,807 byte fragments.
This may make it difficult for an implementation to always deliver complete messages to you via its API...
So whilst, in the general case, the answer to your question is that the WebSocket protocol is a message based protocol and you don't have to manually frame your messages. The API that you're using to use the protocol may either have message size limits in place (to allow it to guarantee delivery of messages as a single chunk) or may present a streaming interface to allow for unlimited sized messages.
I ranted about this back during the standardisation process here.
WebSocket is a message-based protocol, so if you send a chunk of data as the payload of a WebSocket message, the peer will receive one separate WebSocket message with exactly that chunk of data as payload.
Related
I'm studying Socket Programming HOWTO and the author at some point says that
A protocol like HTTP uses a socket for only one transfer.
Is it because of the design of the HTTP protocol itself? Or is it because it is based on TCP, so all protocols based on it (e.g. UDP) must use one socket for only one transfer?
This statement is taken out of context. The context is to point out that TCP is not a message based protocol but an unstructured byte stream. And to have a message semantic one needs to have some way to determine where a message ends.
It then takes HTTP as an example where a message might simply end with a connection close and points out the limitations - namely only a single message per connection per direction. Then it goes on to describe how protocols can be designed without this limitation, i.e. having multiple messages per connection.
HTTP still can be used like this, i.e. have a single request and end with connection close. This is the design of HTTP version 0.9, but can still be done with HTTP/1. But with HTTP/1 it can also be used for multiple messages, one after the other. And with HTTP/2 it can do multiple messages in parallel, multiplexed over a single TCP connection. And HTTP/3 does not even use TCP anymore.
Do all protocols based on TCP use one socket per transfer?
Protocols are not limited to one connection ("socket") per message ("transfer"). Depending on the design of the protocol multiple messages can be send one after the other by having some pre-known message size or a clear message delimiter. Some protocols might send multiple messages in parallel by implementing a multiplexing layer on top of TCP. Some protocols might even use multiple TCP connections in parallel to deliver a single message, i.e. distributing the message over multiple connections.
That statement was probably written in 1996 or earlier. Since 1997, HTTP supports persistent connections, reusing the same TCP connection and the same socket for multiple queries.
I ask this question because I had a very weird puzzling experience that I am about to tell.
I am instrumenting an HTTP API server to observe it's behavior in the presence of latency between the server and the clients. I had a setup consisting of a single server and a dozen of clients connected with a 10Gbps Ethernet fabric. I measured the time it took to serve certain API requests in 5 scenarios. In each scenario, I set the latency between the server and the clients to one of the values: No latency (I call this baseline), 25ms, 50ms, 250ms or 400ms using the tc-netem(8) utility.
Because I am using histogram buckets to quantify the service time, I observed that all the requests were processed in less than 50ms whatever the scenario is, which clearly doesn't make any sense as, for example, in the case of 400ms, it should be at least around 400ms (as I am only measuring the duration from the moment the request hits the server to the moment the HTTP Write()function returns). Note that the response objects are between 1Kb to 10Kb in size.
Initially, I had doubts that the *http.ResponsWriter's Write() function was asynchronous and returns immediately before data is received by the client. So, I decided to test this hypothesis by writing a toy HTTP server that services the content of a file that is generated using dd(1) and /dev/urandom to be able to reconfigure the response size. Here is the server:
var response []byte
func httpHandler(w http.ResponseWriter, r * http.Request) {
switch r.Method {
case "GET":
now: = time.Now()
w.Write(response)
elapsed: = time.Since(now)
mcs: = float64(elapsed / time.Microsecond)
s: = elapsed.Seconds()
log.Printf("Elapsed time in mcs: %v, sec:%v", mcs, s)
}
}
func main() {
response, _ = ioutil.ReadFile("BigFile")
http.HandleFunc("/hd", httpHandler)
http.ListenAndServe(":8089", nil)
}
Then I start the server like this:
dd if=/dev/urandom of=BigFile bs=$VARIABLE_SIZE count=1 && ./server
from the client side, I issue time curl -X GET $SERVER_IP:8089/hd --output /dev/null
I tried with many values of $VARIABLE_SIZE from the range [1Kb, 500Mb], using an emulated latency of 400ms between the server and each one of the clients. To make long story short, I noticed that the Write() method blocks until the data is sent when the response size is big enough to be visually noticed (on the order of tens of megabytes). However, when the response size is small, the server doesn't report a mentally sane servicing time compared to the value reported by the client. For a 10Kb file, the client reports 1.6 seconds while the server reports 67 microseconds (which doesn't make sense at all, even me as a human I noticed a little delay on the order of a second as it is reported by the client).
To go a little further, I tried to find out starting from which response size the server returns a mentally acceptable time. After many trials using a binary search algorithm, I discovered that the server always returns few microseconds [20us, 600us] for responses that are less than 86501 bytes in size and returns expected (acceptable) times for requests that are >= 86501 bytes (usually half of the time reported by the client). As an example, for a 86501 bytes response, the client reported 4 seconds while the server reported 365 microseconds. For 86502 bytes, the client reported 4s and the sever reported 1.6s. I repeated this experience many times using different servers, the behavior is always the same. The number 86502 looks like magic !!
This experience explains the weird observations I initially had because all the API responses were less than 10Kb in size. However, this opens the door for a serious question. What the heck on earth is happening and how to explain this behavior ?
I've tried to search for answers but didn't find anything. The only thing I can think about is maybe it is related to Linux's sockets size and whether Go makes the system call in a non-blocking fashion. However, AFAIK, TCP packets transporting the HTTP responses should all be acknowledged by the receiver (the client) before the sender (the server) can return ! Breaking this assumption (as it looks like in this case) can lead to disasters ! Can someone please provide an explanation for this weird behavior ?
Technical details:
Go version: 12
OS: Debian Buster
Arch: x86_64
I'd speculate the question is stated in a wong way in fact: you seem to be guessing about how HTTP works instead of looking at the whole stack.
The first thing to consider is that HTTP (1.0 and 1.1, which is the standard version since long time ago) does not specify any means for either party to acknowledge data reception.
There exists implicit acknowledge for the fact the server received the client's request — the server is expected to respond to the request, and when it responds, the client can be reasonably sure the server had actually received the request.
There is no such thing working in the other direction though: the server does not expect the client to somehow "report back" — on the HTTP level — that it had managed to read the whole server's response.
The second thing to consider is that HTTP is carried over TCP connections (or TLS, whcih is not really different as it uses TCP as well).
An oft-forgotten fact about TCP is that it has no message framing — that is, TCP performs bi-directional transfer of opaque byte streams.
TCP only guarantees total ordering of bytes in these streams; it does not in any way preserve any occasional "batching" which may naturally result from the way you work with TCP via a typical programming interface — by calling some sort of "write this set of bytes" function.
Another thing which is often forgotten about TCP is that while it indeed uses acknowledgements to track which part of the outgoing stream was actually received by the receiver, this is a protocol detail which is not exposed to the programming interface level (at least not in any common implementation of TCP I'm aware of).
These features mean that if one wants to use TCP for message-oriented data exchange, one needs to implement support for both message boundaries (so-called "framing") and acknowledgement about the reception of individual messages in the procotol above TCP.
HTTP is a protocol which is above TCP but while it implements framing, it does not implement explicit acknowledgement besides the server responding to the client, described above.
Now consider that most if not all TCP implementations employ buffering in various parts of the stack. At least, the data which is submitted by the program gets buffered, and the data which is read from the incoming TCP stream gets buffered, too.
Finally consider that most commonly used TCP implementations provide for sending data into an active TCP connection through the use of a call allowing to submit a chunk of bytes of arbitrary length.
Considering the buffering described above, such a call typically blocks until all the submitted data gets copied to the sending buffer.
If there's no room in the buffer, the call blocks until the TCP stack manages to stream some amount of data from that buffer into the connection — freeing some room to accept more data from the client.
What all of the above means for net/http.ResponseWriter.Write interacting with a typical contemporary TCP/IP stack?
A call to Write would eventially try to submit the specified data into the TCP/IP stack.
The stack would try to copy that data over into the sending buffer of the corresponding TCP connection — blocking until all the data manages to be copied.
After that you have essentially lost any control about what happens with that data: it may eventually be successfully delivered to the receiver, or it may fail completely, or some part of it might succeed and the rest will not.
What this means for you, is that when net/http.ResponseWriter.Write blocks, it blocks on the sending buffer of the TCP socket underlying the HTTP connection you're operating on.
Note though, that if the TCP/IP stack detects an irrepairable problem with the connection underlying your HTTP request/response exchange — such as a frame with the RST flag coming from the remote part meaning the connection has been unexpectedly teared down — this problem will bubble up the Go's HTTP stack as well, and Write will return a non-nil error.
In this case, you will know that the client was likely not able to receive the complete response.
I read from here that Websocket is a frame-based protocol instead of stream-based. But it also states Why are WebSockets frame-based and not stream-based? I don’t know and just like you, I’d love to learn more, so if you have an idea, feel free to add comments and resources in the responses below.
Can anyone explain what is the advantage of using frame-based protocol in Websocket?
Maybe this pre-existing answer will help provide some reference for the discussion.
By utilizing frames and having a massage based based protocol (vs. a stream based protocol) it makes it easier to write Web oriented applications.
Common actions, such as sending JSON data, become easier and don't require every application to implement a network-data buffering/caching layer for fragmented messages.
EDIT (answering comment)
The TCP/IP layer guarantees network packet delivery and ordering, but has no concept of data length - it's a streaming protocol and it promises that the stream will arrive in order, no more than that.
If any data arrives out of order, than the TCP/IP protocol layer will re-order the data. This may require an internal cache/buffer that will hold on to the existing data while waiting for the missing data.
In contrast, the WebSocket is message based and is aware of the message data length.
The WebSocket frames use a header with data length (total / partial) to allow the WebSocket protocol layer to concat all the data as a single unit, even when it's distributed across multiple TCP/IP packets or (even) WebSocket frames.
This requires the protocol layer to keep the data in an internal buffer until all the expected data in the message arrives. The WebSocket protocol will forward the message to the application only when the whole of it's data arrives.
This "holding on to data" in order to extract message "units" from the stream is the caching/buffering element I was referring to.
I'm reading about Websocket and I see that protocol have a data fragmentation (frames), a WebSocket message is composed of one or more frames, but it's not what TCP (fragmentation of data) do? I'm confused.
Fragmentation in the context of data transfer just means splitting the original data into smaller parts for transfer and combining these fragments later (for example at the recipients side) again to recreate the original data.
Fragmentation is often done if the underlying layer cannot handle larger messages or if larger messages will result in performance problems. Such problems might be because it is more expensive if one large message is lost and need to be repeated instead of only a small fragment. Or it can be a performance problem if the transfer of one large message would block the delivery of smaller messages. In this case it is useful to split the large message into fragments and deliver these message fragments together with the other messages so that these don't have to wait for delivery until the large message is done.
Fragmentation of messages in WebSockets is just one of the many types of fragmentation which exist at various layers at the data transport, like:
IP messages can be fragmented at the sender or some middlebox and get reassembled at the end.
TCP is a data stream. The various parts of the stream are transferred in different IP packets and get reassembled in the correct order at the recipient.
Application layer protocols like HTTP can have fragments too, for example the chunked Transfer-Encoding mode within HTTP or the fragments in WebSockets.
And at even higher layers there can be more fragments, like the spreading of a single large ZIP file into multiple parts onto floppy disks in former times or the accelerating of downloads by requesting different parts of the same file in parallel connections and combining these at the recipient.
I love the detailed answer by Steffen Ullrich, but I wish to add a few specific details regarding the differences between raw TCP/IP and the added Websockets layer.
TCP/IP is a stream protocol, meaning the application receives the data as fragmented pieces as data become available, with no clear indication of the fragmented "packet boundaries" or the original (non-fragmented) data structure.
The Websocket protocol is a message based protocol, meaning that the application will only receive the full Websocket message once all the fragmented pieces have arrived and put back together.
As a very simplified example:
TCP/IP: if a 50 Mb file is sent using TCP, the application will probably receive a piece of the file at a time and it will need to piece the file back together (possibly saving each piece to a temporary disk storage).
Websocket: if a 50 Mb file is sent using the Websocket protocol, the application will receive the whole of the 50Mb in one message (and the storage of all of the data, memory or disk, will be dictated by the Websocket layer, not the application layer).
Note that the Websocket Protocol is an additional layer over the TCP/IP protocol, so data is streamed over TCP/IP and the Websocket layer puts the pieces back together before forwarding the original (whole) message).
5.4. Fragmentation
A secondary use-case for fragmentation is for multiplexing, where it
is not desirable for a large message on one logical channel to
monopolize the output channel, so the multiplexing needs to be free
to split the message into smaller fragments to better share the
output channel. (Note that the multiplexing extension is not
described in this document.)
Even though it's listed as secondary reason, I'd say that't the primary reason for that fragmentation feature. Imagine, if you try to send first message with 1GB size and right away when you start sending it you also send second message with 1KB size. Framing allows applications to inject second message in between individual frames of the first message, this way receiver will not need to wait for 1GB to be transferred and will receive/handle 1KB second message right away.
I'm writing a program using Java non-blocking socket and TCP. I understand that TCP is a stream protocol but the underlayer IP protocol uses packets. When I call SocketChannel.read(ByteBuffer dst), will I always get the whole content of IP packets? or it may end at any position in the middle of a packet?
This matters because I'm trying to send individual messages through the channel, each messages are small enough to be sent within a single IP packet without being fragmented. It would be cool if I can always get a whole message by calling read() on the receiver side, otherwise I have to implement some method to re-assembly the messages.
Edit: assume that, on the sender side, messages are sent with a long interval(like 1 second), so they aren't going to group together in one IP packet. On the receiver side, the buffer used to call read(ByteBuffer dst) is big enough to hold any message.
TCP is a stream of bytes. Each read will receive between 1 and the maximum of the buffer size that you supplied and the number of bytes that are available to read at that time.
TCP knows nothing of your concept of messages. Each send by client can result in 0 or more reads being required at the other end. Zero or more because you might get a single read that returns more than one of your 'messages'.
You should ALWAYS write your read code such that it can deal with your message framing and either reassemble partial messages or split multiple ones.
You may find that if you don't bother with this complexity then your code will seem to 'work' most of the time, don't rely on that. As soon as you are running on a busy network or across the internet, or as soon as you increase the size of your messages you WILL be bitten by your broken code.
I talk about TCP message framing some more here: http://www.serverframework.com/asynchronousevents/2010/10/message-framing-a-length-prefixed-packet-echo-server.html and here: http://www.serverframework.com/asynchronousevents/2010/10/more-complex-message-framing.html though it's in terms of a C++ implementation so it may or may not be of interest to you.
The socket API makes no guarantee that send() and recv() calls correlate to datagrams for TCP sockets. On the sending side, things may get regrouped already, e.g. the system may defer sending one datagram to see whether the application has more data; on the receiving side, a read call may retrieve data from multiple datagrams, or a partial datagram if the size specified by the caller is requires breaking packet.
IOW, the TCP socket API assumes you have a stream of bytes, not a sequence of packets. You need make sure you keep calling read() until you have enough bytes for a request.
From the SocketChannel documentation:
A socket channel in non-blocking mode, for example, cannot read
any more bytes than are immediately available from the socket's input buffer;
So if your destination buffer is large enough, you are supposed to be able to consume the whole data in the socket's input buffer.