Handling messages over TCP - networking

I'm trying to send and receive messages over TCP using a size of each message appended before the it starts.
Say, First three bytes will be the length and later will the message:
As a small example:
005Hello003Hey002Hi
I'll be using this method to do large messages, but because the buffer size will be a constant integer say, 200 Bytes. So, there is a chance that a complete message may not be received e.g. instead of 005Hello I get 005He nor a complete length may be received e.g. I get 2 bytes of length in message.
So, to get over this problem, I'll need to wait for next message and append it to the incomplete message etc.
My question is: Am I the only one having these difficulties to appending messages to each other, appending lengths etc.. to make them complete Or is this really usually how we need to handle the individual messages on TCP? Or, if there is a better way?

What you're seeing is 100% normal TCP behavior. It is completely expected that you'll loop receiving bytes until you get a "message" (whatever that means in your context). It's part of the work of going from a low-level TCP byte stream to a higher-level concept like "message".
And "usr" is right above. There are higher level abstractions that you may have available. If they're appropriate, use them to avoid reinventing the wheel.

So, there is a chance that a complete message may not be received e.g.
instead of 005Hello I get 005He nor a complete length may be received
e.g. I get 2 bytes of length in message.
Yes. TCP gives you at least one byte per read, that's all.
Or is this really usually how we need to handle the individual messages on TCP? Or, if there is a better way?
Try using higher-level primitives. For example, BinaryReader allows you to read exactly N bytes (it will internally loop). StreamReader lets you forget this peculiarity of TCP as well.
Even better is using even more higher-level abstractions such as HTTP (request/response pattern - very common), protobuf as a serialization format or web services which automate pretty much all transport layer concerns.
Don't do TCP if you can avoid it.

So, to get over this problem, I'll need to wait for next message and append it to the incomplete message etc.
Yep, this is how things are done at the socket level code. For each socket you would like to allocate a buffer of at least the same size as kernel socket receive buffer, so that you can read the entire kernel buffer in one read/recv/resvmsg call. Reading from the socket in a loop may starve other sockets in your application (this is why they changed epoll to be level-triggered by default, because the default edge-triggered forced application writers to read in a loop).
The first incomplete message is always kept in the beginning of the buffer, reading the socket continues at the next free byte in the buffer, so that it automatically appends to the incomplete message.
Once reading is done, normally a higher level callback is called with the pointers to all read data in the buffer. That callback should consume all complete messages in the buffer and return how many bytes it has consumed (may be 0 if there is only an incomplete message). The buffer management code should memmove the remaining unconsumed bytes (if any) to the beginning of the buffer. Alternatively, a ring-buffer can be used to avoid moving those unconsumed bytes, but in this case the higher level code should be able to cope with ring-buffer iterators, which it may be not ready to. Hence keeping the buffer linear may be the most convenient option.

Related

MPI_Probe() for determining message size

A common usage of MPI_Probe is in determining the size of an incoming message so that enough memory is allocated for the receive buffer. But this can also be done with a separate pair of MPI_Send-MPI_Recv calls, i.e. the sender process sends the message size to the receiver in a different message. Can it be assumed that MPI_Probe is in general the faster option? Why? We can perform some tests and compare the walltimes, but the results may be implementation-dependent.
For short messages the latency is more important than the size of the message, so probing for a small message is probably faster.
Probing makes it easier to deal with MPI_ANY_SOURCE as a sender: otherwise you'd have to first determine where the size msg comes from, and then do a specific receive from that source.
Instead of MPI_Probe, people often do MPI_Iprobe which tells you if there is a message at all. Yes, you can emulate that with multiple Irecvs, but why would you make your code so complicated?

Count sent and received bytes in Go in an http.Handler ServeHTTP function?

How can sent and received bytes be counted from within a ServeHTTP function in Go?
The count needs to be relatively accurate. Skipping connection establishment is not ideal, but acceptable. But headers must be included.
It also needs to be fast. Iterating is generally too slow.
The counting itself doesn't need to occur within ServeHTTP, as long the count for a given connection can be made available to ServeHTTP.
This must also not break HTTPS or HTTP/2.
Things I've Tried
It's possible to get a rough, slow estimate of received bytes by iterating over the Request headers. This is far too slow, and the Go standard library removes and combines headers, so it's not accurate either.
I tried writing an intercepting Listener, which created an internal tls.Listen or net.Listen Listener, and whose Accept() function got a net.Conn from the internal Listener's Accept(), and then wrapped that in an intercepting net.Conn whose Read and Write functions call the real net.Conn and count their reads and writes. It's then possible to make those counts available to the ServeHTTP function via mutexed shared variables.
The problem is, the intercepting Conn breaks HTTP/2, because Go's internal libraries cast the net.Conn as a *tls.Conn (e.g. https://golang.org/src/net/http/server.go#L1730), and it doesn't appear possible in Go to wrap the object while still making that cast succeed (if it is, it would solve this problem).
Counting sent bytes can be done relatively accurately by counting what is written to the ResponseWriter. Counting received bytes in the HTTP body is also achievable, via Request.Body. The critical issue here appears to be quickly and accurately counting request header bytes. Though again, also counting connection establishment bytes would be ideal.
Is this possible? How?
I think it is possible, but I can't say I've done it. However, based on browsing the stdlib implementation of the HTTP server and TLS listener, I don't see why it shouldn't be possible; the key is wrapping the connection before TLS instead of after. This also gets you a more accurate count of bytes on the wire, rather than a count of decrypted bytes.
You've already got an intercepting Listener, you just need to insert it in the right spot. Rather than passing your Listener to http.Serve (or wherever you're inserting it), you want to pass it to tls.NewListener first, which wraps it in the TLS handler, and then pass the result, which will be a TLS listener (making Go's HTTP/2 support happy) into the HTTP server.
Of course, if you want a count of decrypted bytes rather than wire bytes, you may be SOL - wrapping the net.Conn just won't get you there. You'll likely have to do the best you can with counting headers & body.

I don't understand what exactly does the function bytesToWrite() Qt

I searched for bytesToWrite in doc and that what I found "For buffered devices, this function returns the number of bytes waiting to be written. For devices with no buffer, this function returns 0."
First what does mean buffered devices. And can anyone please explain to me what exactly this function does and where or how can I use it.
Many IO devices are buffered, which means that data isn't sent straight away, but it is accumulated to be sent in bulk when there is a sufficient amount.
This is done essentially to have better performance, as sending data normally has some fixed overhead (at the very least the syscall overhead), which is well amortized when sending data in bulk, but would have to be paid for each write if no buffering would be used.
(notice that here we are only talking about QIODevice buffers, normally there are also all kinds of kernel-mode buffers and buffers internal to hardware devices themselves)
bytesToWrite tells you how much stuff is in the QIODevice write buffer, i.e. how many bytes you wrote that are waiting to be actually written (as in, given to the OS to write).
I never actually had to use that member, but I suppose it could be useful e.g. to in a producer-consumer scenario (=if the write buffer is lower than something, then you have to actually calculate the next chunk of data to send), to manually handle buffering in some places or even just for debugging/logging purposes.
it's actually very usefull when you're using an asynchronous API.
you can for example, use it inside a bytesWritten() slot to tell wether the buffer is empty and the data has been fully written or not.

How do you read without specifying the length of a byte slice beforehand, with the net.TCPConn in golang?

I was trying to read some messages from a tcp connection with a redis client (a terminal just running redis-cli). However, the Read command for the net package requires me to give in a slice as an argument. Whenever I give a slice with no length, the connection crashes and the go program halts. I am not sure what length my byte messages need going to be before hand. So unless I specify some slice that is ridiculously large, this connection will always close, though this seems wasteful. I was wondering, is it possible to keep a connection without having to know the length of the message before hand? I would love a solution to my specific problem, but I feel that this question is more general. Why do I need to know the length before hand? Can't the library just give me a slice of the correct size?
Or what other solution do people suggest?
Not knowing the message size is precisely the reason you must specify the Read size (this goes for any networking library, not just Go). TCP is a stream protocol. As far as the TCP protocol is concerned, the message continues until the connection is closed.
If you know you're going to read until EOF, use ioutil.ReadAll
Calling Read isn't guaranteed to get you everything you're expecting. It may return less, it may return more, depending on how much data you've received. Libraries that do IO typically read and write though a "buffer"; you would have your "read buffer", which is a pre-allocated slice of bytes (up to 32k is common), and you re-use that slice each time you want to read from the network. This is why IO functions return number of bytes, so you know how much of the buffer was filled by the last operation. If the buffer was filled, or you're still expecting more data, you just call Read again.
A bit late but...
One of the questions was how to determine the message size. The answer given by JimB was that TCP is a streaming protocol, so there is no real end.
I believe this answer is incorrect. TCP divides up a bitstream into sequential packets. Each packet has an IP header and a TCP header See Wikipedia and here. The IP header of each packet contains a field for the length of that packet. You would have to do some math to subtract out the TCP header length to arrive at the actual data length.
In addition, the maximum length of a message can be specified in the TCP header.
Thus you can provide a buffer of sufficient length for your read operation. However, you have to read the packet header information first. You probably should not accept a TCP connection if the max message size is longer than you are willing to accept.
Normally the sender would terminate the connection with a fin packet (see 1) not an EOF character.
EOF in the read operation will most likely indicate that a package was not fully transmitted within the allotted time.

TCP: multiple messages in a row

Is it within TCP standard that multiple messages, sent from server to client in a row, will be accepted by client at same order (and bytes of one message will not be scattered within other messages)?
TCP provides an in-order byte stream delivery service. The bytes won't arrive in another order but the number of writes need not be equal to the number of reads.
You will never read bytes in another order than that in which they were sent
You can make no assumptions on "messages". TCP doesn't know about messages, only bytes (see above). Both the sender and the receiver can coalesce and split such "messages"
TCP uses a sequence number to identify each byte of data. The sequence number identifies the order of the bytes sent from each computer so that the data can be reconstructed in order, regardless of any fragmentation, disordering, or packet loss that may occur during transmission.
I agree with #cnicutar.
How are you deserializing the objects? I suspect the problem lies there.
For example if your messages are like
ABCD followed 200 ms later by PQR. It may appear as:
ABC followed by PQR
or ABCDPQR
or even AB followed by CD followed by PQ followed by R.
Basically you cannot make assumptions based on time of receiving the data.
The deserialization logic should know the object boundaries within a stream of bytes. This information should be encoded into the stream by the serialization logic.
If you are using Java, you can use ObjectInputStream & ObjectOutputStream and not be bothered about serialzation issues.
J2ME Polish has a good serialization utility that can be very easily ported to other platforms. I have myself used it in live environment.

Resources