Can TCP handle a stream which never ends in a single connection?

Can TCP handle a stream which never ends in a single connection? - tcp

This is more of a theoretical question. Let us say that there is an infinite data source, which keeps pushing data every second. Some device which monitors "Solar events", and sends events to a back-end system continuously, every nanosecond ( to mean its a continuous stream ). And the back-end system wants to transmit the live data to another remote system over TCP. Can TCP handle the infinite data stream in a single TCP connection ?
I'm aware of the sequence number limitation, but with TCP timestamps, the sequence numbers will properly wrap around, and it should not pose a problem. Also, assume that the system has several terabytes of memory ( which can be considered close to an infinite memory model ). If I just give the base address of where the stream starts, will TCP able to proceed ( segmenting, transmitting, re-transmitting .. etc ) continuously in a single TCP connection, without bothering on whether the data ever ends ?
My guess is that since TCP never expects any stream length parameter, it should be possible. Am I right ?

Basically, yes. As long as the data is byte, ('octet'), aligned, data on TCP streams can be piped anywhere, (see any router). TCP comms is a byte stream - it doesn't care about message boundaries. The windowed protocol has built-in flow-control, so it should all work.

Related

Parsing TCP Packets back together

I am working on a tool that takes a pcap file (from wireshark in this case), and attempts to parse out data from the TCP packets.
Now in this case, I only care about the data in one direction. So my logic was to sort out each wireshark captured packet into a list by the protocol-destIP-sourceIP-destPort-sorcePort.
So from this point, I now have a list of only packets for one direction on a particular port.
From there I just want to be able to walk through the bodies of the TCP payloads in order. is it as simple as then going in order by Sequence numbers?
I would simply take the first sequence number captured, add the payload size to it and expect that to be the next TCP packet sent? Is there more to this that I am missing?
I was noticing when sorting the interfaces this way, eventually I would come up to a sequence that dosent make sense. I guess I could just assume that is the start of the next stream? I know it becomes more difficult if I have to consider traffic going back and forth... but in this case I only want to watch packets in one direction.

Wireshark captures the packet on the wire. It might happen that the packets don't arrive in sequential order, that packets are corrupt (bad checksum), that there are duplicate packets ... - and this is ignoring possible attacks designed to confuse the analysis. The TCP stack will take care of all these problems so that the application gets the right packets, but Wireshark works outside the TCP stack. Thus, while in most cases your simple procedure will likely work (assuming that you at least check TCP flags for start and end of connection), it might fail in some cases.

why does TCP buffer data on receiver side

In most descriptions of the TCP PUSH function, it is mentioned that the PUSH feature not only requires the sender to send the data immediately (without waiting for its buffer to fill), but also requires that the data be pushed to receiving application on the receiver side, without being buffered.
What I dont understand is why would TCP buffer data on receiving side at all? After all, TCP segments travel in IP datagrams, which are processed in their entirety (ie IP layer delivers only an entire segment to TCP layer after doing any necessary reassembly of fragments of the IP datagram which carried any given segment). Then, why would the receiving TCP layer wait to deliver this data to its application? One case could be if the application were not reading the data at that point in time. But then, if that is the case, then forcibly pushing the data to the application is anyway not possible. Thus, my question is, why does PUSH feature need to dictate anything about receiver side behavior? Given that an application is reading data at the time a segment arrives, that segment should anyway be delivered to the application straightaway.
Can anyone please help resolve my doubt?

TCP must buffer received data because it doesn't know when the application is going to actually read the data and it has told the sender that it is willing to receive (the available "window"). All this data gets stored in the "receive window" until such time as it gets read out by the application.
Once the application reads the data, it drops the data from the receive window and increases the size it reports back to the sender with the next ACK. If this window did not exist, then the sender would have to hold off sending until the receiver told it to go ahead which it could not do until the application issued a read. That would add a full round-trip-delay worth of latency to every read call, if not more.
Most modern implementations also make use of this buffer to keep out-of-order packets received so that the sender can retransmit only the lost ones rather than everything after it as well.
The PSH bit is not generally used acted upon. Yes, implementations send it but it typically doesn't change the behavior of the receiving end.

Note that, although the other comments are correct (the PSH bit doesn't impact application behaviour much at all in most implementations), it's still used by TCP to determine ACK behaviour. Specifically, when the PSH bit is set, the receiving TCP will ACK immediately instead of using delayed ACKs. Minor detail ;)

How would I simulate TCP-RTM using/in NS2?

Here is a paper named "TCP-RTM: Using TCP for Real Time Multimedia Applications" by Sam Liang, David Cheriton.
This paper is to adapt tcp to be used in Real time application.
The two major modification which i actually want you to help me are:
On application-level read on the TCP connection, if there is no in sequence data queued to read but one or more out-of-order packets are queued for the connection, the first contiguous range of out-of-order packets is moved from the out-of-order queue to the receive queue, the receive pointer is advanced beyond these packets, and the resulting data delivered to the application. On reception of an out-of-order packet with a sequence number logically greater than the current receive pointer (rcv next ptr) and with a reader waiting on the connection, the packet data is delivered to the waiting receiver, the receive pointer is advanced past this data and this new receive pointer is
returned in the next acknowledgment segment.
In the case that the sender’s send-buffer is full due to large amount of backlogged data, TCP-RTM discards the oldest data segment in the buffer and accepts the new data written by the application. TCP-RTM also advances its send-window past the discarded data segment. This way, the application write calls are never blocked and the timing of the sender application is not broken.
They actually changed the 'tcpreno with sack' version of tcp in an old linux 2.2 kernel in real environment.
But, I want to simulate this in NS2.
I can work with NS2 e.g., analyzing, making performance graphs etc. I looked all the related files but can't find where to change.
So, would you please help me to do this.

TCP Window size libnids

My intent is to write a app. layer process on top of libnids. The reason for using libnids API is because it can emulate Linux kernel TCP functionality. Libnids would return hlf->count_new which the number of bytes from the last invocation of TCP callback function. However the tcp_callback is called every time a new packet comes in, therefore hlf->count_new contains a single TCP segment.
However, the app. layer is supposed to receive the TCP window buffer, not separate TCP segments.
Is there any way to get the data of the TCP window (and not the TCP segment)? In other words, to make libnids deliver the TCP window buffer data.
thanks in advance!

You have a misunderstanding. The TCP window is designed to control the amount of data in flight. Application reads do not always trigger TCP window changes. So the information you seek is not available in the place you are looking.
Consider, for example, if the window is 128KB and eight bytes have been sent. The receiving TCP stack must acknowledge those eight bytes regardless of whether the application reads them or not, otherwise the TCP connection will time out. Now imagine the application reads a single byte. It would be pointless for the TCP stack to enlarge the window by one byte -- and if window scaling is in use, it can't do that even if it wants to.
And then what? If four seconds later the application reads another single byte, adjust the window again? What would be the point?
The purpose of the window is to control data flow between the two TCP stacks, prevent the buffers from growing infinitely, and control the amount of data 'in flight'. It only indirectly reflects what the application has read from the TCP stack.
It is also strange that you would even want this. Even if you could tell what had been read by the application, of what possible use would that be to you?

Non-blocking socket with TCP

I'm writing a program using Java non-blocking socket and TCP. I understand that TCP is a stream protocol but the underlayer IP protocol uses packets. When I call SocketChannel.read(ByteBuffer dst), will I always get the whole content of IP packets? or it may end at any position in the middle of a packet?
This matters because I'm trying to send individual messages through the channel, each messages are small enough to be sent within a single IP packet without being fragmented. It would be cool if I can always get a whole message by calling read() on the receiver side, otherwise I have to implement some method to re-assembly the messages.
Edit: assume that, on the sender side, messages are sent with a long interval(like 1 second), so they aren't going to group together in one IP packet. On the receiver side, the buffer used to call read(ByteBuffer dst) is big enough to hold any message.

TCP is a stream of bytes. Each read will receive between 1 and the maximum of the buffer size that you supplied and the number of bytes that are available to read at that time.
TCP knows nothing of your concept of messages. Each send by client can result in 0 or more reads being required at the other end. Zero or more because you might get a single read that returns more than one of your 'messages'.
You should ALWAYS write your read code such that it can deal with your message framing and either reassemble partial messages or split multiple ones.
You may find that if you don't bother with this complexity then your code will seem to 'work' most of the time, don't rely on that. As soon as you are running on a busy network or across the internet, or as soon as you increase the size of your messages you WILL be bitten by your broken code.
I talk about TCP message framing some more here: http://www.serverframework.com/asynchronousevents/2010/10/message-framing-a-length-prefixed-packet-echo-server.html and here: http://www.serverframework.com/asynchronousevents/2010/10/more-complex-message-framing.html though it's in terms of a C++ implementation so it may or may not be of interest to you.

The socket API makes no guarantee that send() and recv() calls correlate to datagrams for TCP sockets. On the sending side, things may get regrouped already, e.g. the system may defer sending one datagram to see whether the application has more data; on the receiving side, a read call may retrieve data from multiple datagrams, or a partial datagram if the size specified by the caller is requires breaking packet.
IOW, the TCP socket API assumes you have a stream of bytes, not a sequence of packets. You need make sure you keep calling read() until you have enough bytes for a request.

From the SocketChannel documentation:
A socket channel in non-blocking mode, for example, cannot read
any more bytes than are immediately available from the socket's input buffer;
So if your destination buffer is large enough, you are supposed to be able to consume the whole data in the socket's input buffer.