With the Sequence number in the TCP header consisting of 32 bits, Does this value wrap around, if so surely does this not cause problems? Again, if so, would this be a problem on long or fast networks, due to the amount of packets in the pipeline?
No, no problem. In fact, the sequence number could even start near the "end" -- it is initialized with some pseudo-random number for anti-spoofing reasons.
Just think of it as a never-ending counter with only the bottom 32 bits showing. There's no problem because we're not actually counting bytes but just enumerating them so there is no confusion as to what bytes are currently being received.
The only limitation is that you could never have more than 4GiB of traffic "in flight" in either direction.
Related
Exactly what the title says. Since the offset number indicates the position of the single fragment in the overall datagram, and since they went to such lengths as dividing the offset by 8, to save space, why isn't the offset a sequential number? It would certainly save more space than dividing the offset by 8, and plus, i doubt that the offset could give much informations about error detection and similar stuff.
So, why isn't it sequential, to save even more space? It would not be offset anymore, it would be like position number, but the meaning would be the same.
If the packets arrive out of order (which is pretty common), having offsets makes it much easier to reassemble the big packet in the memory by putting the fragments right into their place, without additional memory shuffling. If you only had sequence numbers, you wouldn't know where to put the fragment of the data.
For high-speed networking, this might make a noticeable performance difference.
It seems the maximum TCP receive window size is 1GB (when scaling is used). So then the largest RTT that would still make it possible to fill a 100Gb pipe with one connection is 40ms (because 2 * 40E-3 * 100E9 / 8 = 1GB). That would limit that sort of communication speed to a distance IRO 10000 kilometres.
Another scaling problem seems to be that 32-bit sequence numbers don't offer protection against duplicated packets delayed by more than about 400ms (because they wrap around in that amount of time). They also limit the window size to 2GB (because they need to be split between the sender and receiver window).
Three questions:
I am aware of TCP timestamps that can help solve the problem of sequence numbers, but I would like to know if that is a feature that just happens to help but was really designed for some other purpose. Also, I don't understand what it is that timestamps achieve that could not be done simply by increasing the number of bits used for sequence numbers.
I don't understand why the maximum receive window is just 1GB as opposed to 2GB that would presumably be trivially possible with the current headers.
Finally, I would like to know if TCP already scales well enough to be used over the sort of links that are supposedly coming soon.
Many thanks.
The TCP features you're talking about were specified in RFC 1323 in the early 1990s. The limitations you're encountering are justified by discussion text in the RFC:
The sequence number appears in the middle of the TCP segment header and could not have been lengthened without an incompatible change.
Using timestamps allows for the protocol to simultaneously measure round-trip time and protect against wrapped sequence numbers. Making the sequence number bigger would not provide any information about round-trip time.
You need the timestamps in order to measure round-trip time accurately. Measuring round-trip time without timestamps is a sampling problem, and the sampling becomes unsolvable due to aliasing if you get more than 1 error per window.
A 1 GB receive window is the largest that can be kept in sync across the connection. The RFC explains it about as well as can be done:
TCP determines if a data segment is "old" or "new" by testing
whether its sequence number is within 2**31 bytes of the left edge
of the window, and if it is not, discarding the data as "old". To
insure that new data is never mistakenly considered old and vice-
versa, the left edge of the sender's window has to be at most
2**31 away from the right edge of the receiver's window.
Similarly with the sender's right edge and receiver's left edge.
Since the right and left edges of either the sender's or
receiver's window differ by the window size, and since the sender
and receiver windows can be out of phase by at most the window
size, the above constraints imply that 2 * the max window size
must be less than 2**31, or
max window < 2**30
As Jonathon mentioned earlier, these limitations are per-TCP connection. It's tough to think of a scenario where a single application could reach the limits of a single TCP connection, and tougher to think of one where the application couldn't open additional connection(s) if needed.
I'm trying to capture packets and reorganize packets for obtaining original HTTP request.
I'm capturing packets by IPQUEUE(by iptables rule), and I figured out that packets are not captured in order.
I already know that in TCP protocol, packets have to be re-sequenced, so I'm trying to re-sequence packets by sequence number.
According to Wikipedia, the sequence number of TCP is 32 bits number. Then, what happens if sequence number reaches to MAX 32bits number?
Because sequence number of SYN packet is random number, I think this limitation can be reached very fast.
If anybody has a commend on it, or has some links helpful, please leave me a answer.
From RFC-1185
Avoiding reuse of sequence numbers within the same connection is
simple in principle: enforce a segment lifetime shorter than the
time it takes to cycle the sequence space, whose size is
effectively 2**31.
If the maximum effective bandwidth at which TCP
is able to transmit over a particular path is B bytes per second,
then the following constraint must be satisfied for error-free
operation:
2**31 / B > MSL (secs)
So In simpler words TCP will take care of it.
In addition of this condition TCP also has concept of Timestamps to handle sequence number wrap around condition. From the same above RFC
Timestamps carried from sender to receiver in TCP Echo options can
also be used to prevent data corruption caused by sequence number
wrap-around, as this section describes.
Specifically TCP uses PAWS mechanism to handle TCP wrap around case.
You can find more information about PAWS in RFC-1323
RFC793 Section 3.3:
It is essential to remember that the actual sequence number space is
finite, though very large. This space ranges from 0 to 2*32 - 1.
Since the space is finite, all arithmetic dealing with sequence
numbers must be performed modulo 2*32. This unsigned arithmetic
preserves the relationship of sequence numbers as they cycle from
2**32 - 1 to 0 again. There are some subtleties to computer modulo
arithmetic, so great care should be taken in programming the
comparison of such values.
Any arithmetics done on the sequence number are modulo 2^32
In simple terms, the 32-bit unsigned number will wrap around:
...
0xFFFFFFFE
0xFFFFFFFF
0x00000000
0x00000001
...
have a look at tcp timestamps section of https://en.wikipedia.org/wiki/Transmission_Control_Protocol
Background: I've spent a while working with a variety of device interfaces and have seen a lot of protocols, many serial and UDP in which data integrity is handled at the application protocol level. I've been seeking to improve my receive routine handling of protocols in general, and considering the "ideal" design of a protocol.
My question is: is there any protocol framing scheme out there that can definitively identify corrupt data in all cases? For example, consider the standard framing scheme of many protocols:
Field: Length in bytes
<SOH>: 1
<other framing information>: arbitrary, but fixed for a given protocol
<length>: 1 or 2
<data payload etc.>: based on length field (above)
<checksum/CRC>: 1 or 2
<ETX>: 1
For the vast majority of cases, this works fine. When you receive some data, you search for the SOH (or whatever your start byte sequence is), move forward a fixed number of bytes to your length field, and then move that number of bytes (plus or minus some fixed offset) to the end of the packet to your CRC, and if that checks out you know you have a valid packet. If you don't have enough bytes in your input buffer to find an SOH or to have a CRC based on the length field, then you wait until you receive enough to check the CRC. Disregarding CRC collisions (not much we can do about that), this guarantees that your packet is well formed and uncorrupted.
However, if the length field itself is corrupt and has a high value (which I'm running into), then you can't check the (corrupt) packet's CRC until you fill up your input buffer with enough bytes to meet the corrupt length field's requirement.
So is there a deterministic way to get around this, either in the receive handler or in the protocol design itself? I can set a maximum packet length or a timeout to flush my receive buffer in the receive handler, which should solve the problem on a practical level, but I'm still wondering if there's a "pure" theoretical solution that works for the general case and doesn't require setting implementation-specific maximum lengths or timeouts.
Thanks!
The reason why all protocols I know of, including those handling "streaming" data, chop up the datastream in smaller transmission units each with their own checks on board is exactly to avoid the problems you describe. Probably the fundamental flaw in your protocol design is that the blocks are too big.
The accepted answer of this SO question contains a good explanation and a link to a very interesting (but rather heavy on math) paper about this subject.
So in short, you should stick to smaller transmission units not only because of practical programming related arguments but also because of the message length's role in determining the security offered by your crc.
One way would be to encode the length parameter so that it would be easily detected to be corrupted, and save you from reading in the large buffer to check the CRC.
For example, the XModem protocol embeds an 8 bit packet number followed by it's one's complement.
It could mean doubling your length block size, but it's an option.
I've got a bit of an odd question. A friend of mine and I thought it would be funny to make a serial port kind of communication between computers using sound. Basically, computers emit a series of beeps to send data, and listen for beeps over a microphone to receive data. In short, the world's most annoying serial port. I have all of the basics worked out. I can filter out sounds of only one frequency and I have sent data from one computer to another. Although the transmission is fairly error free, being affected only by very loud noises, some issues still exist. My question is, what are some good ways to check the data for errors and, more importantly, recover from these errors.
My serial communication is very standard once you get past the fact it uses sound waves. I use one start bit, 8 data bits, and one stop bit in every frame. I have already considered Cyclic Redundancy Checks, and I plan to factor this into my error checking, but CRCs don't account for some of the more insidious issues. For example, consider sending two bytes of data. You send the first one, and it received correctly, but just after the stop bit of the first byte, and the start bit of the next, a large book falls on the floor, which the receiver interprets to be a start bit, now the true start bit is read as part of the data and the receiver could be reading garbage data for many bytes to come. Eventually, a pause in the data could get things back on track.
That isn't the worst of it though. Bits can be dropped too, and most error checking schemes I can think of rely on receiving a certain number of bytes. What happens when the receiver keeps waiting for bytes that may not come?
So, you can see the complexity of this question. If you can direct me to any resources, or just give me a few tips, I would greatly appreciate your help.
A CRC is just a part of the solution. You can check for bad data but then you have to do something about it. The transmitter has to re-send the data, it needs to be told to do that. A protocol.
The starting point is that you split up the data into packets. A common approach is a start byte that indicates the start of the packet, followed by a packet number, followed by a length byte that indicates the length of the packet. Followed by the data bytes and the CRC. The receiver sends an ACK or NAK back to indicate success.
This solves several problems:
you don't care about a bad start bit anymore, the pause you need to recover is always there
you start a timer when you receive the first bit or byte, declare failure when the timer expires before the entire packet is received
the packet number helps you recover from bad ACK/NAK returns. The transmitter times out and resends the packet, you can detect the duplicate
RFC 916 describes such a protocol in detail. I never heard of anybody actually implementing it (other than me). Works pretty well.