Can anybody explain how the receiver know if two nonconsecutive TCP segments belong to the same packet? - tcp

Can anybody explain how does the receiver know if two nonconsecutive TCP segments belong to the same or different packets ? And how does it know if the next segment is the last segments in the packet ?

The receiver doesn't assemble TCP segments into packets, it assembles them into streams. The receiver knows the location, in the stream, of its received segment by its sequence number.
Is it possible that you are expecting the count result of the receiving application's read() system call to conform to the sending application's write() system call? If so, you will be disappointed. TCP streams are byte-wise, not packet-wise, streams. They neither preserve nor honor the boundaries of the sending system calls.

TCP does not deal with fragmentation. That's an IP problem. Packets arrive at the TCP level only when complete. IP uses special fields in the header that indicates whether the packet is fragmented or not, and, if yes, whether the fragment received is the last one or not.
You may take a look :
Transmission Control Protocol
Internet Protocol

Related

how to know whether the TCP ends or not?

TCP is stream to communicate and it has varying length. So in the application, how I can know whether the TCP ends or not?
In the transfom layer, The TCP packet header doesn't have a length field and its length is varying, how can the TCP layer know where is the end.
You design a protocol that runs on top of TCP that includes a length field (or a message terminator). You can look at existing protocols layered on top of TCP (such as DNS, HTTP, IRC, SMTP, SMB, and so on) to see how they do this.
To avoid pain, it helps to have a thorough understanding of several different protocols layered on top of TCP before attempting to design your own. There's a lot of subtle details you can easily get wrong.
If you look at this post, it gives a good answer.
Depending on what you are communicating with, or how you are communicating there will need to be some sort of character sequence that you look for to know that an individual message or transmission is done if you plan to leave the socket open.
n the transfom layer
I assume you mean 'transport layer'?
The TCP packet header doesn't have a length field and its length is varying
The IP header has a length field. Another one in the TCP header would be redundant.
how can the TCP layer know where is the end.
From the IP header length word, less the IP and TCP header sizes.
When a party of a TCP connection receives a FIN or RST signal it knows that the other side has stopped sending. At an API level you can call shutdown to give that signal. The other side will then get a zero length read and knows that nothing more will be coming.

tcp: recomposing data at the end

How do TCP knows which is the last packet of a large file (that was segmented by tcp) in the scenario that the connection is kept-established. (like ftp or sending mp3 on yahoo messenger)
I mean how does it know which packet carries data of one.mp3 and which packet carries data of another.mp3 ??
Anyone ?
Thank you
There are at least 2 possible approaches.
Declare upfront how much data you're going to send. Something like a packet that declares Sending a message that's 4008 bytes long
The second approach is to use a terminating sequence (nastier to process)
So the receiver:
Tries to read the declared amount or
Scans for the terminating sequence
TCP is a stream protocol and fragmentation should be transparent to a TCP application. It operates on streams of data, never packets. A stream is assembled to its intended order using the sequence numbers. The sequence of bytes send by application is encapsulated in tcp segments. The stream is recreated on the receiver side before data is delivered to the application.
The IP protocol can do fragmentation.
Each TCP segment goes to the IP layer and may be fragmented there. Segment is reassembled by collecting all of the packets and offset field from the header is used to put it in the right place.

TCP/IP protocol and fragmentation

Using the TCP/IP protocol, given a connection between a client and a server, are the packets sent by the client to the server always received in the same order they were sent?
For example, if the client sends 3 packets of data, A, B and C, will the server always receive A first followed by B and C or is it possible for the server to receive C first, followed by A and B?
At IP level, packets may arrive in any order (if they arrive). At TCP level, the data stream is guaranteed to be ordered in the same manner on both ends.
That means yes, the server will always receive A then B then C. As long as you are using TCP.
When using TCP, data is received by the destination application in the same order as it is sent by the source application.
See the following for more details:
http://en.wikipedia.org/wiki/Transmission_Control_Protocol#Data_transfer
TCP is a transmission protocol, and it transmits data by sending the data out in IP packets over the underlying IP network. TCP is responsible for ensuring the correct transmission of the data, which includes ordering the arriving packets, re-requesting missing ones and discarding duplicates.
TCP as such does not expose any notion of "packet" to the user; the fact that the data is chunked into IP packets is a detail of the "over IP" implementation. A different implementation, e.g. TCP-over-bicycle-courier, might employ an entirely different scheme.
It cannot happen that you receive data in a different order on the application side over a TCP socket.
It may happen that packets are received in a different order by the networking layer of the OS, but TCP makes it a requirement that the upper levels get data in order. It is the OS' role to ask again for unreceived fragments etc and assemble these fragments. So, you need not worry.
UDP, on the other hand, offers no such guarantee.
The server (as the physical NIC of the machine) might receive them in any order. Your OS might receive them in any order again - that will mostly (but not allways) be the order of physical reception. Your client application is guaranteed to receive them in correct order, thats a property of TCP
In general, packets will be received in the same order they are transmitted. But the network may drop or reorder packets. For example, packets may take different routes and arrive out of order. Packets may be lost or even duplicated on the network. The TCP implementation is responsible for retransmitting packets that are lost, acknowledging packets that are received, ignoring duplicated packets, all with the objective of accurately reconstructing the transmitted byte stream at the receiver.
At the application level, you send a stream of bytes and receive a stream of bytes. TCP does whatever is needed to ensure the received stream of bytes is identical to the sent stream of bytes, regardless of what happens to the packets on the network.

Why do we say the IP protocol in TCP/IP suite is connectionless?

Why is the IP called a connectionless protocol? If so, what is the connection-oriented protocol then?
Thanks.
Update - 1 - 20:21 2010/12/26
I think, to better answer my question, it would be better to explain what "connection" actually means, both physically and logically.
Update - 2 - 9:59 AM 2/1/2013
Based on all the answers below, I come to the feeling that the 'connection' mentioned here should be considered as a set of actions/arrangements/disciplines. Thus it's more an abstract concept rather than a concrete object.
Update - 3 - 11:35 AM 6/18/2015
Here's a more physical explanation:
IP protocol is connectionless in that all packets in IP network are routed independently, they may not necessarily go through the same route, while in a virtual circuit network which is connection oriented, all packets go through the same route. This single route is what 'virtual circuit' means.
With connection, because there's only 1 route, all data packets will arrive in the same order as they are sent out.
Without connection, it is not guaranteed all data packets will arrive
in the same order as they are sent out.
Update - 4 - 9:55 AM 2016/1/20/Wed
One of the characteristics of connection-oriented is that the packet order is preserved. TCP use a sequence number to achieve that but IP has no such facility. Thus TCP is connection-oriented while IP is connection-less.
The basic idea is pretty simple: with IP (on its own -- no TCP, UDP, etc.) you're just sending a packet of data. You simply send some data onto the net with a destination address, but that's it. By itself, IP gives:
no assurance that it'll be delivered
no way to find out if it was
nothing to let the destination know to expect a packet
much of anything else
All it does is specify a minimal packet format so you can get some data from one point to another (e.g., routers know the packet format, so they can look at the destination and send the packet on its next hop).
TCP is connection oriented. Establishing a connection means that at the beginning of a TCP conversation, it does a "three way handshake" so (in particular) the destination knows that a connection with the source has been established. It keeps track of that address internally, so it can/will/does expect more packets from it, and be able to send replies to (for example) acknowledge each packet it receives. The source and destination also cooperate to serial number all the packets for the acknowledgment scheme, so each end knows whether packets it sent were received at the other end. This doesn't involve much physically, but logically it involves allocating some memory on both ends. That includes memory for metadata like the next packet serial number to use, as well as payload data for possible re-transmission until the other side acknowledges receipt of that packet.
TCP/IP means "TCP over IP".
TCP
--
IP
TCP provides the "connection-oriented" logic, ordering and control
IP provides getting packets from A to B however it can: "connectionless"
Notes:
UDP is connection less but at the same level as TCP
Other protocols such as ICMP (used by ping) can run over IP but have nothing to do with TCP
Edit:
"connection-oriented" mean established end to end connection. For example, you pick up the telephone, call someone = you have a connection.
"connection-less" means "send it, see what happens". For example, sending a letter via snail mail.a
So IP gets your packets from A to B, maybe, in any order, not always eventually. TCP sorts them out, acknowledges them, requests a resends and provides the "connection"
Connectionless means that no effort is made to set up a dedicated end-to-end connection, While Connection-Oriented means that when devices communicate, they perform handshaking to set up an end-to-end connection.
IP is an example of the Connectionless protocols , in this kind of protocols you usually send informations in one direction, from source to destination without checking to see if the destination is still there, or if it is prepared to receive the information . Connectionless protocols (Like IP and UDP) are used for example with the Video Conferencing when you don't care if some packets are lost , while you have to use a Connection-Oriented protocol (Like TCP) when you send a File because you want to insure that all the packets are sent successfully (actually we use FTP to transfer Files). Edit :
In telecommunication and computing in
general, a connection is the
successful completion of necessary
arrangements so that two or more
parties (for example, people or
programs) can communicate at a long
distance. In this usage, the term has
a strong physical (hardware)
connotation although logical
(software) elements are usually
involved as well.
The physical connection is layer 1 of
the OSI model, and is the medium
through which the data is transfered.
i.e., cables
The logical connection is layer 3 of
the OSI model, and is the network
portion. Using the Internetwork
Protocol (IP), each host is assigned a
32 bit IP address. e.g. 192.168.1.1
TCP is the connection part of TCP/IP. IP's the addressing.
Or, as an analogy, IP is the address written on the envelope, TCP is the postal system which uses the address as part of the work of getting the envelope from point A to point B.
When two hosts want to communicate using connection oriented protocol, one of them must first initiate a connection and the other must accept it. Logically a connection is made between a port in one host and other port in the other host. Software in one host must perform a connect socket operation, and the other must perform an accept socket operation. Physically the initiator host sends a SYN packet, which contains all four connection identifying numbers (source IP, source port, destination IP, destination port). The other receives it and sends SYN-ACK, the initiator sends an ACK, then the connection are established. After the connection established, then the data could be transferred, in both directions.
In the other hand, connectionless protocol means that we don't need to establish connection to send data. It means the first packet being sent from one host to another could contain data payloads. Of course for upper layer protocols such as UDP, the recipient must be ready first, (e.g.) it must perform a listen udp socket operation.
The connectionless IP became foundation for TCP in the layer above
In TCP, at minimal 2x round trip times are required to send just one packet of data. That is : a->b for SYN, b->a for SYN-ACK, a->b for ACK with DATA, b->a for ACK. For flow rate control, Nagle's algorithm is applied here.
In UDP, only 0.5 round trip times are required : a->b with DATA. But be prepared that some packets could be silently lost and there is no flow control being done. Packets could be sent in the rate that are larger than the capability of the receiving system.
In my knowledge, every layer makes a fool of the one above it. The TCP gets an HTTP message from the Application layer and breaks it into packets. Lets call them data packets. The IP gets these packets one by one from TCP and throws it towards the destination; also, it collects an incoming packet and delivers it to TCP. Now, TCP after sending a packet, waits for an acknowledgement packet from the other side. If it comes, it says the above layer, hey, I have established a connection and now we can communicate! The whole communication process goes on between the TCP layers on both the sides sending and receiving different types of packets with each other (such as data packet, acknowledgement packet, synchronization packet , blah blah packet). It uses other tricks (all packet sending) to ensure the actual data packets to be delivered in ordered as they were broken and assembled. After assembling, it transfers them to the above application layer. That fool thinks that it has got an HTTP message in an established connection but in reality, just packets are being transferred.
I just came across this question today. It was bouncing around in my head all day and didn't make any sense. IP doesn't handle transport. Why would anyone even think of IP as connectionless or connection oriented? It is technically connectionless because it offers no reliability, no guaranteed delivery. But so is my toaster. My toaster offers no guaranteed delivery, so why not call aa toaster connectionless too?
In the end, I found out it's just some stupid title that someone somewhere attached to IP and it stuck, and now everyone calls IP connectionless and has no good reason for it.
Calling IP connectionless implies there is another layer 3 protocol that is connection oriented, but as far as I know, there isn't and it is just plain stupid to specify that IP is connectionless. MAC is connectionless. LLC is connectionless. But that is useless, technically correct info.

how to reassemble tcp segment?

im now developing a project using winpcap..as i have known packets being sniffed are usually fragmented packets.
how to reassemble this TCP segements?..any ideas, suggestion or tutorials available?..
this i assume to be the only way i can view the HTTP header...
thanks!..
tcp is a byte stream protocol.
the sequence of bytes sent by your http application is encapsulated in tcp data segments and the byte stream is recreated before the data is delivered to the application on the other side.
since you are accessing the tcp datasegments using winpcap, you need to go to the data portion of the segment. the header of tcp has a fixed length of 20 bytes + an optional part which you need to determine using the winpcap api.
the length of data part in the tcp segment is determined by subtracting the tcp header length (obtained from a field in the tcp segment) and the ip header length (from a field in the ip datagram that encapsulates the tcp segment) from the total length (obtained from another field in the ip datagram).
so now you have the total segment length and the length of the data part within the segment. so you know offset where the http request data starts.
the offset is
total length-length of data part
or
length of ip-header + length of tcp header
i have not used winpcap. so you will have to find out how to get these fields using the api.
also ip datagrams may be further fragmented but i am expecting that you are provided only reassembled datagrams using this api. you are good to go!
There is no such thing as a TCP fragment. The IP protocol has fragments. TCP is a stream protocol. You can assemble the stream to its intended order by following the sequence numbers of both sides. Every TCP Packet goes to the IP level and can be fragmented there. You can assemble each packet by collecting all of the fragments and following the fragment offset from the header.
All of the information you need is in the headers. The wikipedia articles are quite useful in explaining what each field is
http://en.wikipedia.org/wiki/TCP_header#Packet_structure
http://en.wikipedia.org/wiki/IPv4#Header
PcapPlusPlus offers this capability out-of-the-box for all major OS's (including Windows). Please check out the TcpReassembly example to see a working code and the API documentation to understand how to use the TCP reassembly feature
Depending on the whose traffic you're attempting to passively reassemble, you may run into some TCP obfuscation techniques designed to confuse people trying to do exactly what you're trying to do. Check out this paper on different operating system reassembly behaviors.
libtins provides classes to perform TCP stream reassembly in a very high level way, so you don't have to worry about TCP internals to do so.

Resources