TCP/IP programming, data in more than one packet - tcp

I am writing an application in C, using libpcap. My program listens for new packets and parses them
according to a grammar. The payload actually is XML.
Sometimes one packet is not enough for an XML file, so the XML buffer is splitted into separate packets.
I want to add code logic in order to handle these cases. However I don't know in advance that a packet does not contain the whole data. How do I know that a packet has more data that will be send next? How to i recognize that a new packet contains the rest of the data?
Do I have to use the TH_FIN flag? Could you please explain it to me?

There's nothing in TCP that defines packets, that's up to the higher layers to define if they need to - TCP is just a stream.
If this is raw XML over a TCP stream, you actually need to parse the xml - you'll know when you have a whole xml document when you've received the end of the document element.
If it's XML packaged over HTTP , you might be able to parse out the Content-Length: header which should contain the length of the body.
Note, reassembling a TCP stream from captured packets is a very hard problem, there's a lot of corner cases, e.g. you'd need to handle retransmission , out of sequence tcp segments and many more. http://libnids.sourceforge.net/ might help you.

As Anon say use a higher level stream library.
But even then you need to know the chunk side before starting to handle it, as you will read from the stream in block's of n bytes.
Thus you want to first send in binary the number of bytes to be sent, then send x bytes, and repeat, thus when you are receiving the chucks via select/read to know went you have all of chunk one to pass to the processor.

If you're using TCP, use a TCP library that gives you the data as a stream instead of trying to handle the packets yourself.

Stream is good. Another option is to store the incoming data in a buffer (eg char*) and search for application messaging framing characters or in the case of Xml, the root end tag. Once you've found a complete xml message at the front of the buffer, pull it out and process.

The XMPP instant messaging protocol, used by Jabber, has means to move XML chunks over a TCP stream. I don't know how exactly it is done myself, but RFC 3290 is the protocol definition. You should be able to work it out from that.

Related

Figuring out what kind of payload is carried by a packet

I'm working with Scapy to parse a set of .pcap files. I would like to understand what kind of payload those packets are carrying. If I have for example a pcap file with a lot of UDP packets which payloads has the same starting bytes I don't know what kind of encoding was used, and the first values keep repeating in other packets. Is there any program or python library that could allow me to figure out or try to guess what kind of encoding was used (if for example is an RTP payload or MPEG one and so on)?
UPDATE
I was able to use nDPI on those pcap files and it gave me satisfying results for all the flows except for a set of them that it was not able to recognize. I'm going to share with you the first part of the hex representation of the data:
f1d00404d1002d7c484830320000020080073804610d00007b09040000000000010f000000000000000000000000000000000000000000000000000121e002a22e537fcccb815afafce2361b
The first part f1d004 does not change between previous and successive packets. I have already tried to decode them with different protocols using wireshark's feature "Decode as". I have tried with RTP,RTCP,RTSP,JSON,MPEG. If can be useful, this is the capture related to a camera, that's why I tried the previous protocols.

NiFi forward/duplicate TCP Stream

I'm supposed to duplicate a binary TCP Stream.
So I set up a NiFi 1.9.0 server, put in a ListenTCP processor and a PutTCP processor, configured the proper IPs and Ports and connected them.
So far so good, the packets were received by the ListenTCP processor and also forwareded by the PutTCP processor.
But NiFi seems to mess around with the data somehow, the sent packets aren't exactly the same as received. I expected NiFi to just forward everything 1:1 but something is happening and I cannot find out what.
I've been playing around with the Character Set, Max Batch Size and Batching Message Delemiter settings on the ListenTCP processor and also with the Outgoing Message Delemiter and Character Set on the PutTCP processor.
I also messed around with a MergeContent processor but didn't get it to work properly.
Here you can see the difference between received (red) and sent data (captured using tcpflow).
Link to picture
Another problem is that I don't really know the data I'm processing, it says in the documentation:
These log files are in the machine-readable binary format that is described by the XML file called ebm.xml.
and
The streamed events are in the TCP-based binary format.
I do have access to ebm.xml file, but not sure how I can make use of it.
Anyone an idea how I can get NiFi to simply forward everything?
I'm new to NiFi, so I might have missed some possibilites...
The ListenTCP processor reads data from the stream using a new-line character as a logical message separator. For example, if the stream had:
<chunk1><new-line><chunk2><new-line><chunk3><new-line>
It would result in reading chunk1, chunk2, and chunk3 into an internal queue.
When it writes them back out it uses the outgoing message delimiter. So the outgoing flow file would be:
<chunk1><outgoing-delim><chunk2><outgoing-delim><chunk3><outgoing-delim>
Unfortunately it is more geared towards receiving textual data such as logs which are typically line-delimited. The chunks should be passing through unaltered as byte[], but typically binary data wouldn't have these logical new-line boundaries, so I'm not sure how well it works for that.

Packets sometimes get concatenated

I'm trying to make a simple server/application in Erlang.
My server initialize a socket with gen_tcp:listen(Port, [list, {active, false}, {keepalive, true}, {nodelay, true}]) and the clients connect with gen_tcp:connect(Server, Port, [list, {active, true}, {keepalive, true}, {nodelay, true}]).
Messages received from the server are tested by guards such as {tcp, _, [115, 58 | Data]}.
Problem is, packets sometimes get concatenated when sent or received and thus cause unexpected behaviors as the guards consider the next packet as part of the variable.
Is there a way to make sure every packet is sent as a single message to the receiving process?
Plain TCP is a streaming protocol with no concept of packet boundaries (like Alnitak said).
Usually, you send messages in either UDP (which has limited per-packet size and can be received out of order) or TCP using a framed protocol.
Framed meaning you prefix each message with a size header (usualy 4 bytes) that indicates how long the message is.
In erlang, you can add {packet,4} to your socket options to get framed packet behavior on top of TCP.
assuming both sides (client/server) use {packet,4} then you will only get whole messages.
note: you won't see the size header, erlang will remove it from the message you see. So your example match at the top should still work just fine
You're probably seeing the effects of Nagle's algorithm, which is designed to increase throughput by coalescing small packets into a single larger packet.
You need the Erlang equivalent of enabling the TCP_NODELAY socket option on the sending socket.
EDIT ah, I see you already set that. Hmm. TCP doesn't actually expose packet boundaries to the application layer - by definition it's a stream protocol.
If packet boundaries are important you should consider using UDP instead, or make sure that each packet you send is delimited in some manner. For example, in the TCP version of DNS each message is prefixed by a 2 byte length header, which tells the other end how much data to expect in the next chunk.
You need to implement a delimiter for your packets.
One solution is to use a special character ; or something similar.
The other solution is to send the size of the packet first.
PacketSizeInBytes:Body
Then read the provided amount of bytes from your message. When you're at the end you got your whole packet.
Nobody mentions that TCP may also split your message into multiple pieces (split your packet into two messages).
So the second solution is the best of all. But a little hard. While the first one is still good but limits your ability to send packets with special characters. But the easiest to implement. Ofc theres a workaround for all of this. I hope it helps.

Nagle-Like Problem

so I have this real-time game, with a C++ sever with disabled nagle using SFML library , and client using asyncsocket, also disables nagle. I'm sending 30 packets every 1 second. There is no problem sending from the client to the server, but when sending from the server to the clients, some of the packets are migrating. For example, if I'm sending "a" and "b" in completly different packets, the client reads it as "ab". It's happens just once a time, but it makes a real problem in the game.
So what should I do? How can I solve that? Maybe it's something in the server? Maybe OS settings?
To be clear: I AM NOT using nagle but I still have this problem. I disabled in both client and server.
For example, if I'm sending "a" and "b" in completly different packets, the client reads it as "ab". It's happens just once a time, but it makes a real problem in the game.
I think you have lost sight of the fundamental nature of TCP: it is a stream protocol, not a packet protocol. TCP neither respects nor preserves the sender's data boundaries. To put it another way, TCP is free to combine (or split!) the "packets" you send, and present them on the receiver any way its wants. The only restriction that TCP honors is this: if a byte is delivered, it will be delivered in the same order in which it was sent. (And nothing about Nagle changes this.)
So, if you invoke send (or write) on the server twice, sending these six bytes:
"packet" 1: A B C
"packet" 2: D E F
Your client side might recv (or read) any of these sequences of bytes:
ABC / DEF
ABCDEF
AB / CD / EF
If your application requires knowledge of the boundaries between the sender's writes, then it is your responsibility to preserve and transmit that information.
As others have said, there are many ways to go about that. You could, for example, send a newline after each quantum of information. This is (in part) how HTTP, FTP, and SMTP work.
You could send the packet length along with the data. The generalized form for this is called TLV, for "Type, Length, Value". Send a fixed-length type field, a fixed-length length field, and then an arbitrary-length value. This way you know when you have read the entire value and are ready for the next TLV.
You could arrange that every packet you send is identical in length.
I suppose there are other solutions, and I suppose that you can think of them on your own. But first you have to realize this: TCP can and will merge or break your application packets. You can rely upon the order of the bytes' delivery, but nothing else.
You have to disable Nagle in both peers. You might want to find a different protocol that's record-based such as SCTP.
EDIT2
Since you are asking for a protocol here's how I would do it:
Define a header for the message. Let's say I would pick a 32 bits header.
Header:
MSG Length: 16b
Version: 8b
Type: 8b
Then the real message comes in, having MSG Length bytes.
So now that I have a format, how would I handle things ?
Server
When I write a message, I prepend the control information (the length is the most important, really) and send the whole thing. Having NODELAY enabled or not makes no difference.
Client
I continuously receive stuff from the server, right ? So I have to do some sort of read.
Read bytes from the server. Any amount can arrive. Keep reading until you've got at least 4 bytes.
Once you have these 4 bytes, interpret them as the header and extract the MSG Length
Keep reading until you've got at least MSG Length bytes. Now you've got your message and can process it
This works regardless of TCP options (such as NODELAY), MTU restrictions, etc.

Non-blocking socket with TCP

I'm writing a program using Java non-blocking socket and TCP. I understand that TCP is a stream protocol but the underlayer IP protocol uses packets. When I call SocketChannel.read(ByteBuffer dst), will I always get the whole content of IP packets? or it may end at any position in the middle of a packet?
This matters because I'm trying to send individual messages through the channel, each messages are small enough to be sent within a single IP packet without being fragmented. It would be cool if I can always get a whole message by calling read() on the receiver side, otherwise I have to implement some method to re-assembly the messages.
Edit: assume that, on the sender side, messages are sent with a long interval(like 1 second), so they aren't going to group together in one IP packet. On the receiver side, the buffer used to call read(ByteBuffer dst) is big enough to hold any message.
TCP is a stream of bytes. Each read will receive between 1 and the maximum of the buffer size that you supplied and the number of bytes that are available to read at that time.
TCP knows nothing of your concept of messages. Each send by client can result in 0 or more reads being required at the other end. Zero or more because you might get a single read that returns more than one of your 'messages'.
You should ALWAYS write your read code such that it can deal with your message framing and either reassemble partial messages or split multiple ones.
You may find that if you don't bother with this complexity then your code will seem to 'work' most of the time, don't rely on that. As soon as you are running on a busy network or across the internet, or as soon as you increase the size of your messages you WILL be bitten by your broken code.
I talk about TCP message framing some more here: http://www.serverframework.com/asynchronousevents/2010/10/message-framing-a-length-prefixed-packet-echo-server.html and here: http://www.serverframework.com/asynchronousevents/2010/10/more-complex-message-framing.html though it's in terms of a C++ implementation so it may or may not be of interest to you.
The socket API makes no guarantee that send() and recv() calls correlate to datagrams for TCP sockets. On the sending side, things may get regrouped already, e.g. the system may defer sending one datagram to see whether the application has more data; on the receiving side, a read call may retrieve data from multiple datagrams, or a partial datagram if the size specified by the caller is requires breaking packet.
IOW, the TCP socket API assumes you have a stream of bytes, not a sequence of packets. You need make sure you keep calling read() until you have enough bytes for a request.
From the SocketChannel documentation:
A socket channel in non-blocking mode, for example, cannot read
any more bytes than are immediately available from the socket's input buffer;
So if your destination buffer is large enough, you are supposed to be able to consume the whole data in the socket's input buffer.

Resources