TCP: multiple messages in a row - tcp

Is it within TCP standard that multiple messages, sent from server to client in a row, will be accepted by client at same order (and bytes of one message will not be scattered within other messages)?

TCP provides an in-order byte stream delivery service. The bytes won't arrive in another order but the number of writes need not be equal to the number of reads.
You will never read bytes in another order than that in which they were sent
You can make no assumptions on "messages". TCP doesn't know about messages, only bytes (see above). Both the sender and the receiver can coalesce and split such "messages"

TCP uses a sequence number to identify each byte of data. The sequence number identifies the order of the bytes sent from each computer so that the data can be reconstructed in order, regardless of any fragmentation, disordering, or packet loss that may occur during transmission.

I agree with #cnicutar.
How are you deserializing the objects? I suspect the problem lies there.
For example if your messages are like
ABCD followed 200 ms later by PQR. It may appear as:
ABC followed by PQR
or ABCDPQR
or even AB followed by CD followed by PQ followed by R.
Basically you cannot make assumptions based on time of receiving the data.
The deserialization logic should know the object boundaries within a stream of bytes. This information should be encoded into the stream by the serialization logic.
If you are using Java, you can use ObjectInputStream & ObjectOutputStream and not be bothered about serialzation issues.
J2ME Polish has a good serialization utility that can be very easily ported to other platforms. I have myself used it in live environment.

Related

Handling messages over TCP

I'm trying to send and receive messages over TCP using a size of each message appended before the it starts.
Say, First three bytes will be the length and later will the message:
As a small example:
005Hello003Hey002Hi
I'll be using this method to do large messages, but because the buffer size will be a constant integer say, 200 Bytes. So, there is a chance that a complete message may not be received e.g. instead of 005Hello I get 005He nor a complete length may be received e.g. I get 2 bytes of length in message.
So, to get over this problem, I'll need to wait for next message and append it to the incomplete message etc.
My question is: Am I the only one having these difficulties to appending messages to each other, appending lengths etc.. to make them complete Or is this really usually how we need to handle the individual messages on TCP? Or, if there is a better way?
What you're seeing is 100% normal TCP behavior. It is completely expected that you'll loop receiving bytes until you get a "message" (whatever that means in your context). It's part of the work of going from a low-level TCP byte stream to a higher-level concept like "message".
And "usr" is right above. There are higher level abstractions that you may have available. If they're appropriate, use them to avoid reinventing the wheel.
So, there is a chance that a complete message may not be received e.g.
instead of 005Hello I get 005He nor a complete length may be received
e.g. I get 2 bytes of length in message.
Yes. TCP gives you at least one byte per read, that's all.
Or is this really usually how we need to handle the individual messages on TCP? Or, if there is a better way?
Try using higher-level primitives. For example, BinaryReader allows you to read exactly N bytes (it will internally loop). StreamReader lets you forget this peculiarity of TCP as well.
Even better is using even more higher-level abstractions such as HTTP (request/response pattern - very common), protobuf as a serialization format or web services which automate pretty much all transport layer concerns.
Don't do TCP if you can avoid it.
So, to get over this problem, I'll need to wait for next message and append it to the incomplete message etc.
Yep, this is how things are done at the socket level code. For each socket you would like to allocate a buffer of at least the same size as kernel socket receive buffer, so that you can read the entire kernel buffer in one read/recv/resvmsg call. Reading from the socket in a loop may starve other sockets in your application (this is why they changed epoll to be level-triggered by default, because the default edge-triggered forced application writers to read in a loop).
The first incomplete message is always kept in the beginning of the buffer, reading the socket continues at the next free byte in the buffer, so that it automatically appends to the incomplete message.
Once reading is done, normally a higher level callback is called with the pointers to all read data in the buffer. That callback should consume all complete messages in the buffer and return how many bytes it has consumed (may be 0 if there is only an incomplete message). The buffer management code should memmove the remaining unconsumed bytes (if any) to the beginning of the buffer. Alternatively, a ring-buffer can be used to avoid moving those unconsumed bytes, but in this case the higher level code should be able to cope with ring-buffer iterators, which it may be not ready to. Hence keeping the buffer linear may be the most convenient option.

How do you read without specifying the length of a byte slice beforehand, with the net.TCPConn in golang?

I was trying to read some messages from a tcp connection with a redis client (a terminal just running redis-cli). However, the Read command for the net package requires me to give in a slice as an argument. Whenever I give a slice with no length, the connection crashes and the go program halts. I am not sure what length my byte messages need going to be before hand. So unless I specify some slice that is ridiculously large, this connection will always close, though this seems wasteful. I was wondering, is it possible to keep a connection without having to know the length of the message before hand? I would love a solution to my specific problem, but I feel that this question is more general. Why do I need to know the length before hand? Can't the library just give me a slice of the correct size?
Or what other solution do people suggest?
Not knowing the message size is precisely the reason you must specify the Read size (this goes for any networking library, not just Go). TCP is a stream protocol. As far as the TCP protocol is concerned, the message continues until the connection is closed.
If you know you're going to read until EOF, use ioutil.ReadAll
Calling Read isn't guaranteed to get you everything you're expecting. It may return less, it may return more, depending on how much data you've received. Libraries that do IO typically read and write though a "buffer"; you would have your "read buffer", which is a pre-allocated slice of bytes (up to 32k is common), and you re-use that slice each time you want to read from the network. This is why IO functions return number of bytes, so you know how much of the buffer was filled by the last operation. If the buffer was filled, or you're still expecting more data, you just call Read again.
A bit late but...
One of the questions was how to determine the message size. The answer given by JimB was that TCP is a streaming protocol, so there is no real end.
I believe this answer is incorrect. TCP divides up a bitstream into sequential packets. Each packet has an IP header and a TCP header See Wikipedia and here. The IP header of each packet contains a field for the length of that packet. You would have to do some math to subtract out the TCP header length to arrive at the actual data length.
In addition, the maximum length of a message can be specified in the TCP header.
Thus you can provide a buffer of sufficient length for your read operation. However, you have to read the packet header information first. You probably should not accept a TCP connection if the max message size is longer than you are willing to accept.
Normally the sender would terminate the connection with a fin packet (see 1) not an EOF character.
EOF in the read operation will most likely indicate that a package was not fully transmitted within the allotted time.

How to manage multi-packet sends with gsocket?

I got a question regarding tcp/ip socket networking. Basically it is there: are there any parts of tcp/ip that I can leverage to help manage multi-packet sends. For example, I want to send a 100 mb binary file which would take something like 70-80 tcp packets. Meanwhile I have a relatively fast polling receive on the other side. Would my receive have to receive each packet it individually and "stitch" together the data packet by packet, looking for some size to be reached(it can look at the opcode and determine size) or is there some way to tell tcp to say "hey I'm sending 100 mb here, let them know when it is finished."
I am using glib's low level socket library (gsocket).
When using a binary encoding like, say protocol buffers, you would wrap the actual payload by inserting a header that would include the information necessary to decode the payload on the other end.
Say appending 8 bytes where the first 4 signify the type of the encoded message and the second four indicate the length of the entire message.
On the receiving side you are then reading this header, that's part of the payload, to determine the message type and length of the message. This lets you combine multiple messages in one payload or split messages across packets and reliably recombine them.

Is there a good way to frame a protocol so data corruption can be detected in every case?

Background: I've spent a while working with a variety of device interfaces and have seen a lot of protocols, many serial and UDP in which data integrity is handled at the application protocol level. I've been seeking to improve my receive routine handling of protocols in general, and considering the "ideal" design of a protocol.
My question is: is there any protocol framing scheme out there that can definitively identify corrupt data in all cases? For example, consider the standard framing scheme of many protocols:
Field: Length in bytes
<SOH>: 1
<other framing information>: arbitrary, but fixed for a given protocol
<length>: 1 or 2
<data payload etc.>: based on length field (above)
<checksum/CRC>: 1 or 2
<ETX>: 1
For the vast majority of cases, this works fine. When you receive some data, you search for the SOH (or whatever your start byte sequence is), move forward a fixed number of bytes to your length field, and then move that number of bytes (plus or minus some fixed offset) to the end of the packet to your CRC, and if that checks out you know you have a valid packet. If you don't have enough bytes in your input buffer to find an SOH or to have a CRC based on the length field, then you wait until you receive enough to check the CRC. Disregarding CRC collisions (not much we can do about that), this guarantees that your packet is well formed and uncorrupted.
However, if the length field itself is corrupt and has a high value (which I'm running into), then you can't check the (corrupt) packet's CRC until you fill up your input buffer with enough bytes to meet the corrupt length field's requirement.
So is there a deterministic way to get around this, either in the receive handler or in the protocol design itself? I can set a maximum packet length or a timeout to flush my receive buffer in the receive handler, which should solve the problem on a practical level, but I'm still wondering if there's a "pure" theoretical solution that works for the general case and doesn't require setting implementation-specific maximum lengths or timeouts.
Thanks!
The reason why all protocols I know of, including those handling "streaming" data, chop up the datastream in smaller transmission units each with their own checks on board is exactly to avoid the problems you describe. Probably the fundamental flaw in your protocol design is that the blocks are too big.
The accepted answer of this SO question contains a good explanation and a link to a very interesting (but rather heavy on math) paper about this subject.
So in short, you should stick to smaller transmission units not only because of practical programming related arguments but also because of the message length's role in determining the security offered by your crc.
One way would be to encode the length parameter so that it would be easily detected to be corrupted, and save you from reading in the large buffer to check the CRC.
For example, the XModem protocol embeds an 8 bit packet number followed by it's one's complement.
It could mean doubling your length block size, but it's an option.

What are all the differences between pipes and message queues?

What are all the differences between pipes and message queues?
Please explain both from vxworks & unix perspectives.
I think pipes are unidirectional but message queues aren't.
But don't pipes internally use message queues, then how come pipes are unidirectional but message queues are not?
What are the other differences you can think of (from design or usage or other perspectives)?
Message Queues are:
UNIDIRECTIONAL
Fixed number of entries
Each entry has a maximum size
All the queue memory (# entries * entry size) allocated at creation
Datagram-like behavior: reading an entry removes it from the queue. If you don't read the entire data, the rest is lost. For example: send a 20 byte message, but the receiver reads 10 bytes. The remaining 10 bytes are lost.
Task can only pend on a single queue using msqQReceive (there are ways to change that with alternative API)
When sending, you will pend if the queue is full (and you don't do NO_WAIT)
When receiving, you will pend if the queue is empty (and you don't do NO_WAIT)
Timeouts are supported on receive and send
Pipes
Are a layer over message Queues <--- Unidirectional!
Have a maximum number of elements and each element has maximum size
is NOT A STREAMING INTERFACE. Datagram semantics, just list message Queues
On read, WILL PEND until there is data to read
On write, WILL PEND until there is space in the underlying message queue
Can use select facility to wait on multiple pipes
That's what I can think of right now.
"VxWorks pipes differ significantly from UNIX pipes", says the vxWorks documentation, and they ain't kidding. Here's the manpages.
It looks like it would not be exaggerating much to say that the only similarity between Unix pipes and vxWorks pipes are that they're a form of IPC. The features are different, the APIs are different, and the implementations are surely very different.
I also found this difference in IPC in UNIX. It states that the difference between them is that Message queues and pipes is that the first stores/retrieves info in packets. While pipes do it character by character.
Msg queue:
Message queue: An anonymous data stream similar to the pipe, but stores
and retrieves information in packets.
Pipe
Pipe: A two-way data stream interfaced through standard input and
output and is read character by character
I also found this question here: Pipe vs msg queue
Comparison of message queues and pipes:
- ONE message queue may be used to pass data in both directions
- the message needn't to be read on first in-first out basis
but can be processed selectively instead
source: see http://www.cs.vsb.cz/grygarek/dosys/IPC.txt
MQs have kernel persistence, and can be opened by multiple processes.

Resources