How to guarantee the transmission of data over the network? - networking

Imagine 2 machines A and B. A wants to make sure B receives a packet P even if there are network failures, how can that be achieved?
Scenario 1:
1) A sends P over the network to B.
Problem: if the network fails at step 1, B won't receive P and A won't know about it.
Scenario 2 with acknowledgement:
1) A sends P over the network to B.
2) B sends back ACK if it receives P.
Problem: If the network fails at step 2, A won't receive the ACK, so A can't reliably know whether B received P or not.
Having ACKs of ACKs would in turn push the problem 1 step further.

This is a huge problem which has been studied for years. Read about the TCP protocol which guarantees that either the data will be delivered or the sender will (eventually) know that delivery may have failed.
You can start reading here: https://en.wikipedia.org/wiki/Transmission_Control_Protocol
but there are many more helpful and interesting pages about TCP on the web.
Or you could just use TCP and take advantage of the work that has already been done.

Related

Can 2 nodes exchange data concurrently with only 1 tcp connection?

Say there are 2 nodes - A and B. Each of them maintains a state, which is a number in their memory. If A sends an increment somehow to B, then B will return 1. A sends an increment again, B will return 2. And so on. Same for B. And both A and B can update its state atomically.
For sending the increment, let's say B starts a TCP server to accept connections, and A is the client that establishes the connection with B. A can send increment to B through that established TCP connection.
Now, the question is: Can B also send increment to A through the same connection, and can A respond back with its own state through that connection?
Moreover, can A and B both send increments and respond to each other concurrently through the same connection? So that if A and B send increment simultaneously to each other, they can respond back with 1.
It’s an easy problem if A and B establish 2 connections - one for A as the client to send increment to B, and the other for A as the server to respond increment from B. And since there are 2 connections, A and B can send “increment” concurrently. But I wonder if it’s possible for A and B to exchange data with only one TCP connection? Does any protocol support this?
Yes, it's possible. A and B can both exchange data through the same connection. However, one of them will act as a server and the other as a client. In fact, even if A tries to connect to B and B to A at the same exact time, TCP is designed so that scenario results in only one connection. This situation is called simultaneous opening. Keep in mind that the classic concept of client and server don't exist per se in the TCP specs (they're just peers), although is intuitive and helpful to regard the peer that performs the active opening as a client and the one that perform the passive opening as a server.
As per data exchange, both can send increment messages to each other through the same connection. Check RFC 675 and RFC 793(The TCP reference) for more detailed in depth info.

Why tcp termination need 4-way-handshake?

When connection sets up, there is:
Client ------SYN-----> Server
Client <---ACK/SYN---- Server ----①
Client ------ACK-----> Server
and when termination comes, there is:
Client ------FIN-----> Server
Client <-----ACK------ Server ----②
Client <-----FIN------ Server ----③
Client ------ACK-----> Server
my question is why ② and ③ can not set in the same package like ① which is ACK and SYN set in one package ???
After googling a lot, I recognized that the four-way is actually two pairs of two-way handshakes.
If termination is a REAL four-way actions, the 2 and 3 indeed can be set 1 at the same packet.
But this a two-phase work: the first phase (i.e. the first two-way handshake) is :
Client ------FIN-----> Server
Client <-----ACK------ Server
At this moment the client has been in FIN_WAIT_2 state waiting for a FIN from Server. As a bidirectional and full-duplex protocol, at present one direction has break down, no more data would be sent, but receiving still work, client has to wait for the other "half-duplex" to be terminated.
While the FIN from the Server was sent to Client, then Client response a ACK to terminate the connection.
Concluding note: the 2 and 3 can not merge into one package, because they belong to different states. But, if server has no more data or no data at all to be sent when received the FIN from client, it's ok to merge 2 and 3 in one package.
References:
http://www.tcpipguide.com/free/t_TCPConnectionTermination-2.htm
http://www.tcpipguide.com/free/t_TCPConnectionEstablishmentProcessTheThreeWayHandsh-3.htm
http://www.tcpipguide.com/free/t_TCPOperationalOverviewandtheTCPFiniteStateMachineF-2.htm
I think of course the 2 and 3 can merge technically, but not flexible enough as not atomic.
The first FIN1 C to S means and only means: I would close on my way of communication.
The first ACK1 S to C means a response to FIN1. OK, I know your channel is disconnected but for my S(server) way connection, I'm not sure yet. Maybe my receiving buffer is not fully handled yet. The time I need is uncertain.
Thus 2 and 3 are not appropriate to merge. Only the server would have right to decide when his way of connection can be disconnected.
From Wikipedia - "It is also possible to terminate the connection by a 3-way handshake, when host A sends a FIN and host B replies with a FIN & ACK (merely combines 2 steps into one) and host A replies with an ACK.[14]"
Source:
Wikipedia - https://en.wikipedia.org/wiki/Transmission_Control_Protocol
[14] - Tanenbaum, Andrew S. (2003-03-17). Computer Networks (Fourth ed.). Prentice Hall. ISBN 0-13-066102-3.
Based on this document, we can see the detail process of the four way handshake as follows
The ACK (marked as ②) is send by TCP stack automatically. And the next FIN (marked as ③) is controlled in application level by calling close socket API. Application has the control to terminate the connection. So in common case, we didn't merge this two packets into one.
In contrast, the ACK/SYN (marked as ①) is send by TCP stack automatically. In the TCP connection establishment stage, the process is straightforward and easier, so TCP stack handles this by default.
If this is observed from the angle of coding, it is more reasonable to have 4 way than 3 way although both are possible ways for use.
When a connection is to be terminated by one side, there are multiple possibilities or states that the peer may have. At least one is normal, one is on transmitting or receiving, one is in disconnected state somehow all of a sudden before this initiation.
The way of termination should take at least above three into consideration for they all have high probabilities to happen in real life.
So it becomes more natural to find out why based on above. If the peer is in offline state, then things are quite easy for client to infer the peer state by delving into the packet content captured in the procedure since the first ack msg could not be received from peer. But if the two messages are combined together, then it becomes much difficult for the client to know why the peer does not respond because not only offline state could lead to the packet missing but also the various exceptions in the procedure of processing in server side could make this happen. Not to mention the client needs to wait large amount of time until timeout. With the additional 1 message, the two problems could become more easier
to handle from client side.
The reason for it looks like coding because the information contained in the message is just like the log of code.
In the Three-Way Handshake (connection setup) : The server must acknowledge (ACK) the client's SYN and the server must also send its own SYN containing the initial sequence number for the data that the server will send on the connection.
That why the server sends its SYN and the ACK of the client's SYN in a single segment.
In connection Termination : it takes four segments to terminate a connection since a FIN and an ACK are required in each direction.
(2) means that The received FIN (first segment) is acknowledged (ACK) by TCP
(3) means that sometime later the application that received the end-of-file will close its socket. This causes its TCP to send a FIN.
And then the last segment will mean that The TCP on the system that receives this final FIN acknowledges (ACK) the FIN.

End-to-end property

I understand the end-to-end principle from the classic MIT paper, which states that executing a function between 2 remote nodes does not depend on the states of the in-between nodes.
But what is end-to-end encryption, end-to-end guarantees, end-to-end protocols, etc...? I couldn't find a precise definition of end-to-end. The term seems to be over-used.
In other words, when one describes a system property X as end-to-end, what does it mean? What is the opposite of end-to-end?
I don't think end-to-end is over-used. It merely says that the property holds from one end to the other. An "end" can be a node or a layer in the computing stack.
Consider three nodes: A, B and C. Node A wants to talk with C. B sits between A and C and forwards messages between them. B is, for example, a load balancer or a gateway.
Encryption is end-to-end if B cannot read and tamper messages sent from A to C. A concrete example is the following: A is your laptop, C is a remote machine in your network at home or work. B is a VPN gateway. The encryption here is not end-to-end because only the link between A and B is actually encrypted. An attacker sitting between B and C would be able to read the clear text. That might be fine in practice, but it is not end-to-end.
Another example. Say we don't care about encryption, but about reliable message transmission. You know that the network might corrupt bits of messages. Therefore, TCP and other protocols have a checksum field that is checked whenever messages are received. But the guarantees of these checksums is not necessarily end-to-end.
If A sends a message m to C relying on the TCP's checksum, a node B sitting in the middle could corrupt the message in an undetectable way. Abstracting most details, node B basically (1) receives m, (2) checks m's checksum, (3) finds the route to C and creates a new message with m's payload, (4) calculates a new checksum for m, and (5) sends m (with the new checksum) to C. Now, if node B corrupts the message after (2) but before step (4), the resulting message arriving on C is corrupted but that cannot be detected by looking at m's checksum! Therefore, such checksum is not end-to-end. Node B does not even have to be malicious. Such a corruption can be caused by hardware errors or more probably by bugs in node B. This has happened a couple of times in Amazon S3 service, for example: this case and this case and this case.
The solution is, obviously, to use application-level checksums, which are end-to-end. Here, a checksum of m's payload is appended to the payload before calculating the lower layer checksum.

tcpip 3-way handshake

Why is data not transferred during the 3rd part of TCP 3-way handshake?
e.g.
(A to B)SYN
(B to A)ACK+SYN
(A to B) ACK.... why cant data be transferred along with this ACK?
I've always believed it was to keep the session establishment phase separate from the data transfer phase so that no real data is transferred until both ends of the session have agreed on the sequence numbers and session options, especially since packets arriving may be from a totally different, previous, session that just happens to have the same endpoints.
However, on further investigation, I'm not entirely certain that transmitting data with the handshake packets is disallowed. The section on TCP connection establishment in my Internetworking with TCP/IP1 book contains the following snippet:
Because of the protocol design, it is possible to send data along with the initial sequence numbers in the handshake segments. In such cases, the TCP software must hold the data until the handshake completes. Once a connection has been established, the TCP software can release data being held and deliver it to a waiting application program quickly.
Since it's certainly possible to construct a TCP packet with SYN (or ACK) and data, this may well be allowed. I've never seen it happen in the wild but, then again, I've never seen a hairy-eared dwarf lemur in the wild either, though I'm assured they exist.
It may be that it's the sockets software that prevents data going out before the session is fully established but TCP appears to consider it valid. It appears you can send data with a SYN-ACK packet (phase 2 of the connection establishment) since you have the other end's sequence number and options. Likewise, sending data with the phase 3 ACK packet appears to be possible as well.
The reason the TCP software holds on to the data until the handshake is fully complete is probably due to the reason mentioned above - only once both ends have agreed on the sequence numbers can you be sure that the data is not from a previous session.
1 Internetworking with TCP/IP Volume 1 Principles, Protocols and Architecture, 3rd edition, Douglas E. Comer, ISBN 0-13-216987-8.

What factors other than latency and bandwidth affect network speed?

I have noticed that viewing images or websites that are hosted on US servers (Im in europe) is considerably slower. The main reason would be the latency because of the distance.
But if 1 packet takes n milliseconds to be received, can't this be alleviated by sending more packets simultaneously?
Does this actually happen or are the packets sent one by one? And if yes what determines how many packets can be send simultaneously (something to do with the cable i guess)?
But if 1 packet takes n milliseconds
to be received, can't this be
alleviated by sending more packets
simultaneously?
Not in a boundless way, by TCP/IP standards, because there algorithms that determine how much can be in flight and not yet acknowledged to avoid overloading the whole network.
Does this actually happen or are the
packets sent one by one?
TCP can and does keep up to a certain amount of packets and data "in flight".
And if yes what determines how many
packets can be send simultaneously
(something to do with the cable i
guess)?
What cable? The same standards apply whether you're on cabled, wireless, or mixed sequences of connections (remember your packet goes through many routers on its way to the destination, and the sequence of router can change among packets).
You can start your study of TCP e.g. wikipedia. Your specific questions deal with congestion control algorithms and standard, Wikipedia will give you pointers to all relevant algorithms and RFCs, but the whole picture won't do you much good if you try to start studying at that spot without a lot of other understanding of TCP (e.g., its flow control concepts).
Wikipedia and similar encyclopedia/tutorial sites can only give you a summary of the summary, while RFCs are not studied to be readable, or understandable to non-experts. If you care about TCP, I'd recommend starting your study with Stevens' immortal trilogy of books (though there are many other valid ones, Stevens' are by far my personal favorites).
The issue is parallelism.
Latency does not directly affect your pipe's throughput. For instance, a dump truck across the country has terrible latency, but wonderful throughput if you stuff it full of 2TB tapes.
The problem is that your web browser can't start asking for things until it knows what to ask for. So, when you load a web page with ten images, you have to wait until the img tags arrive before you can send the request for them. So everything is perceptibly slower, not because your connection is saturated but because there's down time between making one request and the next.
A prefetcher helps alleviate this issue.
As far as "multiple packets at a time" are concerned, a single TCP connection will have many packets in transit at once, as specified by the window scaling algorithm the ends are using. But that only helps with one connection at a time...
TCP uses what's called a sliding window. Basically the amount of buffer space, X, the receiver has to re-assemble out of order packets. The sender can send X bytes past the last acknowledged byte, sequence number N, say. This way you can fill the pipe between sender and receiver with X unacknowledged bytes under the assumption that the packets will likely get there and if not the receiver will let you know by not acknowledging the missing packets. On each response packet the receiver sends a cumulative acknowledgment, saying "I've got all the bytes up to byte X." This lets it ack multiple packets at once.
Imagine a client sending 3 packets, X, Y, and Z, starting at sequence number N. Due to routing Y arrives first, then Z, and then X. Y and Z will be buffered at the destination's stack and when X arrives the receiver will ack N+ (the cumulative lengths of X,Y, and Z). This will bump the start of the sliding window allowing the client to send additional packets.
It's possible with selective acknowledgement to ack portions of the sliding window and ask the sender to retransmit just the lost portions. In the classic scheme is Y was lost the sender would have to resend Y and Z. Selective acknowledgement means the sender can just resend Y. Take a look at the wikipedia page.
Regarding speed, one thing that may slow you down is DNS. That adds an additional round-trip, if the IP isn't cached, before you can even request the image in question. If it's not a common site this may be the case.
TCP Illustrated volume 1, by Richard Stevens is tremendous if you want to learn more. The title sounds funny but the packets diagrams and annotated arrows from one host to the other really make this stuff easier to understand. It's one of those books that you can learn from and then end up keeping as a reference. It's one of my 3 go-to books on networking projects.
The TCP protocol will try to send more and more packets at a time, up to a certain amount (I think), until it starts noticing that they're dropping (the packets die in router/switch land when their Time To Live expires) and then it throttles back. This way it determines the size of the window (bandwidth) that it can use. If the sending host is noticing from its end a lot of dropped packets, then the receiving host is just going to see it as a slow connection. It could very well be blasting you with data, you're just not seeing it.
I guess parallel packets transmission is also possible(ya.. there can be limiit on No. of packets to be send at a time)..U will get more information about the packet transmission from topics::> message switching,packet switching ,circuit switching & virtual circuit packet switching...

Resources