TCP send function retransmission logic? - tcp

When we send a packet and re-transmission starts does we come out of send function or not?
In my case my application took a lock and waits for send to return and then it leaves the lock.
But In my scenario it never came back. I want to know do we really come out of send function when we have a re transmission case?

The send function transfers data into the socket send buffer, blocking while there isn't enough room.
Data is removed from the socket send buffer when acknowledged.
Retransmission starts when data that has been sent to the peer hasn't been acknowledged within the appropriate timeout interval.
The interactions between retransmission and the send() function consist basically of this: if data hasn't been acknowledged, it is still in the send buffer, which may cause the send() function to block.

Related

Basic TCP protocol questions -- What happens on send() and recv()

I have some basic questions on TCP protocol
Situation: Machine_A calls send(sockfd) to send data to Machine_B. send() call succeeds.
Question: When the send() call returns, does it mean the data has already reached Machine_B? Or has it just been accepted by the operating system
Situation: Machine_A calls send(sockfd) to send data to Machine_B. But the application_B on Machine_B has not been reading from the socket fast enough. Application_A is writing 10MB/s but Application_B is just reading 1KB/sec.
Question:
When does the send() call succeed on Machine_A in this case?
Does it succeed the moment the data is submitted to OS_A on Machine_A or does it wait until there is an acknowledgement from OS_B?
Does OS_B require Application_B to pull the packets before it is acknowledged to OS_A?
send only cares about putting data into the local socket buffer, i.e. it will not wait for an ACK from the recipients machine or even wait until the data are processed by the recipient application (which is even later). If you need this kind of information you would need to have some application-level acknowledgement. Moreover, while an ACK gets send by TCP it would not get send by other protocols like UDP anyway.
send will only fail if it cannot put data in the socket buffer, maybe because there is no socket buffer (socket closed) or because the socket buffer is already full but send called non-blocking. If the socket buffer is full and send is called blocking it will just block until there is again space in the socket buffer.

How to identify pushback under Win IOCP?

How can I identify TCP pushback when using IOCP? I.e. how can I find out that the receiver is not receiving, that tx/rx buffers on both sides of connection are full and that the sender should cease to send more data?
With any async TCP send operation the way to determine the rate that the peer is receiving data is to monitor the rate of send completions on the sender.
I've written about this in depth here. In summary, when the receiver's buffers fill and TCP flow control is in operation and the TCP window is reduced the sender cannot send which causes the sender's TCP buffers to fill. This then means that async send requests can not complete. If you track the number of outstanding send requests that are pending you can spot this situation and throttle the sender.
When this happens, the send won't complete.

TCP socket continuously returns EAGAIN

In my testing, I found that when i send packets of bytes 1000-5000 bytes from my sender, they get assembled/bundled at receiver with sizes 8000-14000 bytes. I checked the wireshark capture to confirm this.
I have 2 questions:
1) Who bundles these packets in between, receiver receives these and I use select() to detect data and call recvmsg api ?.
2) When packets of lengths increase at receiver, I implemented partial reception so that 'recvmsg' returns partial data also. In this case after some time, recvmsg call returns EAGAIN with 0 bytes.
The connection with peer is still up, because peer is still sending packets, why is recvmsg call returing error with EAGAIN ?.
Please help !
I found that when i send packets of bytes 1000-5000 bytes from my sender, they get assembled/bundled at receiver with sizes 8000-14000 bytes
1) Who bundles these packets in between, receiver receives these and I use select() to detect data and call recvmsg api?.
If those are Ethernet packets of size 9000 bytes then you must be using jumbo frames.
TCP is a stream, there is no concept of a message. Data sent in multiple send() calls may be received in one recv() call and vice versa.
2) When packets of lengths increase at receiver, I implemented partial reception so that 'recvmsg' returns partial data also. In this case after some time, recvmsg call returns EAGAIN with 0 bytes. The connection with peer is still up, because peer is still sending packets, why is recvmsg call returing error with EAGAIN ?.
It means you are using non-blocking sockets and no more data is available in the socket receive buffer. In this case you should use select/poll/epoll to wait on one or more sockets until more data is available for reading.

Disconnect and Reconnect a connected datagram socket

Iam trying to create an iterative server based on datagram sockets (UDP).
It calls connect to the first client which it gets from the first recvfrom() call (yes I know this is no real connect).
After having served this client, I disconnect the UDP socket (calling connect with AF_UNSPEC)
Then I call recvfrom() to get the first packet from the next client.
Now the problem is, that the call of recvfrom() in the second iteration of the loop is returning 0. My clients never send empty packets, so what could be going on.
This is what Iam doing (pseudocode):
s = socket(PF_INET, SOCK_DGRAM, 0)
bind(s)
for(;;)
{
recvfrom(s, header, &client_address) // get first packet from client
connect(s,client_address) // connect to this client
serve_client(s);
connect(s, AF_UNSPEC); // disconnect, ready to serve next client
}
EDIT: I found the bug in my client accidently sending an empty packet.
Now my problem is how to make the client wait to get served instead of sending a request into nowhere (server is connected to another client and doesn't serve any other client yet).
connect() is really completely unnecessary on SOCK_DGRAM.
Calling connect does not stop you receiving packets from other hosts, nor does it stop you sending them. Just don't bother, it's not really helpful.
CORRECTION: yes, apparently it does stop you receiving packets from other hosts. But doing this in a server is a bit silly because any other clients would be locked out while you were connect()ed to one. Also you'll still need to catch "chaff" which float around. There are probably some race conditions associated with connect() on a DGRAM socket - what happens if you call connect and packets from other hosts are already in the buffer?
Also, 0 is a valid return value from recvfrom(), as empty (no data) packets are valid and can exist (indeed, people often use them). So you can't check whether something has succeeded that way.
In all likelihood, a zero byte packet was in the queue already.
Your protocol should be engineered to minimise the chance of an errant datagram being misinterpreted; for this reason I'd suggest you don't use empty datagrams, and use a magic number instead.
UDP applications MUST be capable of recognising "chaff" packets and dropping them; they will turn up sooner or later.
man connect:
...
If the initiating socket is not connection-mode, then connect()
shall set the socket’s peer address, and no connection is made.
For SOCK_DGRAM sockets, the peer address identifies where all
datagrams are sent on subsequent send() functions, and limits
the remote sender for subsequent recv() functions. If address
is a null address for the protocol, the socket’s peer address
shall be reset.
...
Just a correction in case anyone stumbples across this like I did. To disconnect connect() needs to be called with the sa_family member of sockaddr set to AF_UNSPEC. Not just passed AF_UNSPEC.

How does TCP/IP report errors?

How does TCP/IP report errors when packet delivery fails permanently? All Socket.write() APIs I've seen simply pass bytes to the underlying TCP/IP output buffer and transfer the data asynchronously. How then is TCP/IP supposed to notify the developer if packet delivery fails permanently (i.e. the destination host is no longer reachable)?
Any protocol that requires the sender to wait for confirmation from the remote end will get an error message. But what happens for protocols where a sender doesn't have to read any bytes from the destination? Does TCP/IP just fail silently? Perhaps Socket.close() will return an error? Does the TCP/IP specification say anything about this?
TCP/IP is a reliable byte stream protocol. All your bytes will get to the receiver or you'll get an error indication.
The error indication will come in the form of a closed socket. Regardless of what the communication pattern (who does the sending), if the bytes can't be delivered, the socket will close.
So the question is, how do you see the socket close? If you're never reading, you'd eventually get an error trying to write to the closed socket (with ECONNRESET errno, I think).
If you have a need to sleep or wait for input on another file handle, you might want to do your waiting in a select() call where you include the socket in the list of sources you're waiting on (even if you never expect to receive anything). If the select() indicates that the socket is ready for a read call, you may get a -1 return (with ECONNRESET, I think). An EOF would indicate an orderly close (other side did a shutdown() or close().
How to distinguish this error close from a clean close (other program exiting, for example)? The errno values may be enough to distinguish error from orderly close.
If you want an unambiguous indication of a problem, you'll probably need to build some sort of application level protocol above the socket layer. For example, a short "ack" message sent by the receiver back to the sender. Then the violation of that higher level application protocol (sender didn't see an ack) would be a confirmation that it was an error close vs a clean close.
The sockets API has no way of informing the writer exactly how many bytes have been received as acknowledged by the peer. There are no guarantees made by the presence of a successful shutdown or close either.
The TCP/IP specification says nothing about the application interface (which is nearly always the sockets API).
SCTP is an alternative to TCP which attempts to address these shortcomings, among others.
In C, if you write to a socket that has failed with send(), you will get back the number of bytes that were sent. If this does not match the number of bytes you meant to send, then you have a problem. But also, when you write to a failed socket, you get SIGPIPE back. Before you start socket handling, you need to have a signal handler in place that will alert you when you get SIGPIPE.
If you are reading from a socket, you really should wrap it with an alarm so you can timeout. Like "alarm(timeout_val); recv(); alarm(0)". Check the return code of recv, and if it's 0, that indicates that the connection has been closed. A negative return result indicates a read failure and you need to check errno.
TCP is built upon the IP protocol, which is the centerpiece for the Internet, providing much of the interoperability that drives Routing, which is what determines how to get packets from their source to their destination. The IP protocol specifies that error messages should be sent back to the sender via Internet Control Message Protocol(ICMP) in the case of a packet failing to get to the sender. Some of these reasons include the Time To Live(TTL) field being decremented to zero, often meaning that the packet got stuck in a routing loop, or the packet getting dropped due to switch contention causing buffer overruns. As others have said, it is the responsibility of the Socket API that is being used to relay these errors at the IP layer up to the application interacting with the network at the TCP layer.
TCP/IP packets are either raw, UDP, or TCP. TCP requires each byte to be acked, and it will re-transmit bytes that are not acked in time. raw, and UDP are connectionless (aka best effort), so any lost packets (barring some ICMP cases, but many of these get filtered for security) are silently dropped. Upper layer protocols can add reliability, such as is done with some raw OSPF packets.

Resources