From the HAProxy documentation on client timeouts:
It is a good practice to cover one or several TCP packet losses by
specifying timeouts that are slightly above multiples of 3 seconds
(eg: 4 or 5 seconds).
That seems like an arbitrary number. What is the significance of the 3 second figure?
It appears this is the default TCP retransmission timeout. From this Microsoft KB article:
TCP starts a re-transmission timer when each outbound segment is
handed down to IP. If no acknowledgment has been received for the data
in a given segment before the timer expires, then the segment is
retransmitted, up to the TcpMaxDataRetransmissions times. The default
value for this parameter is 5.
The re-transmission timer is initialized to 3 seconds when a TCP
connection is established; however it is adjusted "on the fly" to
match the characteristics of the connection using Smoothed Round Trip
Time (SRTT) calculations as described in RFC793. The timer for a given
segment is doubled after each re-transmission of that segment. Using
this algorithm, TCP tunes itself to the "normal" delay of a
connection. TCP connections over high-delay links will take much
longer to time out than those over low- delay links.
Related
I read a little about PAWS (Protection Against Wrapping Sequence). It's very interesting. I didn't know such complicated things are implemented to guarantee the reliability of TCP. Without PAWS, in the case of high data rate, a delayed old packet can be received and regarded as the new packet by mistake.
I didn't think much about this before. But now I started to wonder how long a packet can stay in network (Especially UDP packet if the type of packet matters). A packet can be delayed, temporarily stay in the network before it's delivered. But it can only stay for a short period of time, right?
In other words, how much time does it take to wait for a (UDP) packet before concluding that it won't come?
If there is an answer, then how is it determined? How to estimate it? (for writing programs related to timeout of packet.)
A simplified example: A server received 2 UDP packets. Each contains an integer to indicate the order. It got No.1 and No.3. It knows No.2 is either delayed or lost. After a period of time, No.2 still doesn't come then it concludes the packet is lost. The packet doesn't exist anymore. (So it won't cause any trouble for new packets in the future, similar to the problem PAWS aims to solve.) But how long should the the server wait before concluding No.2 doesn't exist anymore?
See RFC 791 #3.2:
Time to Live
The time to live is set by the sender to the maximum time the
datagram is allowed to be in the internet system. If the datagram
is in the internet system longer than the time to live, then the
datagram must be destroyed.
This field must be decreased at each point that the internet header
is processed to reflect the time spent processing the datagram.
Even if no local information is available on the time actually
spent, the field must be decremented by 1. The time is measured in
units of seconds (i.e. the value 1 means one second). Thus, the
maximum time to live is 255 seconds or 4.25 minutes. Since every
module that processes a datagram must decrease the TTL by at least
one even if it process the datagram in less than a second, the TTL
must be thought of only as an upper bound on the time a datagram may
exist. The intention is to cause undeliverable datagrams to be
discarded, and to bound the maximum datagram lifetime.
UDP is a fire-and-forget, best-effort protocol. There is no expectation by the receiving host that the UDP packet is coming. Upper layers can use their own guarantees or expectations, but UDP has none.
UDP doesn't wait on packets the way TCP does.
Latency (delay) is defined here as the time that a packet spends in travelling between sender and reciever.
Above definition is made for IP packets as far as I can understand. Can we say latency includes retransmission time for missing frames in data link layer? Or this definition assumes there is no missing frame?
Is it possible to make a latency definition for application level? Say, we have an application A. A uses TCP to send messages to a remote application. Since TCP is used missing segments will be retransmitted. So that, latency of an A message includes the missing segments' retransmission time.
Can we say latency includes retransmission time for missing frames in data link layer? Or this definition assumes there is no missing frame?
If you're measuring application latency, you can define latency to include the time it takes for missing TCP segments to be retransmitted.
Is it possible to make a latency definition for application level? Say, we have an application A. A uses TCP to send messages to a remote application. Since TCP is used missing segments will be retransmitted. So that, latency of an A message includes the missing segments' retransmission time.
This measurement is very feasible; obviously you will need to implement measurements of this latency within your application... also be aware that Nagle could skew your latency measurements upwards if your messages typically are larger than TCP MSS (1460 bytes on standard ethernet segments). If your messages tend to be larger than TCP MSS, disable Nagle to get the lowest average message latency.
Because of geographic distance between server and client network latency can vary a lot. So I want to get "pure" req. processing time of service without network latency.
I want to get network latency as TCP connecting time. As far as I understand this time depends a lot on network.
Main idea is to compute:
TCP connecting time,
TCP first packet receive time,
Get "pure" service time = TCP first packet receive (waiting time) – TCP connecting.
I divide TCP connecting by 2 because in fact there are 2 requests-response (3-way handshake).
I have two questions:
Should I compute TCP all packets receive time instead of only first packet?
Is this method okay in general?
PS: As a tool I use Erlang's gen_tcp. I can show the code.
If at all, i guess the "pure" service time = TCP first packet receive - TCP connecting.. You have written other way round.
A possible answer to your first questions is , you should ideally compute atleast some sort of average by considering pure service time of many packets rather than just first packet.
Ideally it can also have worst case, average case, best case service times.
For second question to be answered we would need why would you need pure service time only. I mean since it is a network application, network latencies(connection time etc...) should also be included in the "response time", not just pure service time. That is my view based on given information.
I have worked on a similar question when working for a network performance monitoring vendor in the past.
IMHO, there are a certain number of questions to be asked before proceeding:
connection time and latency: if you base your network latency metric, beware that it takes into account 3 steps: Client sends a TCP/SYN, Server responds with a TCP/SYN-ACK, the Client responds by a final ACK to set up the TCP connection. This means that the CT is equivalent to 1.5 RTT (round trip time). This validates taking the first two steps of the TCP setup process in acccount like you mention.
Taking in account later TCP exchanges: while this first sounds like a great idea to keep evaluating network latency in the course of the session, this becomes a lot more tricky. Here is why: 1. Not all packets have to be acknowledged (RFC1122 or https://en.wikipedia.org/wiki/TCP_delayed_acknowledgment) , which will generate false measurements when it occurs, so you will need an heuristic to take these off your calculations. 2. Not all systems consider acknowledging packets a high priority tasks. This means that some high values will pollute your network latency data and simply reflect the level of load of the server for example.
So if you use only the first (and reliable) measurement you may miss some network delay variation (especially in apps using long lasting TCP sessions).
The Wikipedia article on TCP indicates that the IP packets transporting TCP segments can sometimes go lost, and that TCP "requests retransmission of lost data".
What exactly are the rules for requesting retransmission of lost data? At what time frequency are the retransmission requests performed? Is there an upper bound on the number? Is there functionality for the client to indicate to the server to forget about the whole TCP segment for which part went missing when the IP packet went missing?
What exactly are the rules for requesting retransmission of lost data?
The receiver does not request the retransmission. The sender waits for an ACK for the byte-range sent to the client and when not received, resends the packets, after a particular interval.
This is ARQ (Automatic Repeat reQuest). There are several ways in which this is implemented.
Stop-and-wait ARQ
Go-Back-N ARQ
Selective Repeat ARQ
are detailed in the RFC 3366.
At what time frequency are the retransmission requests performed?
The retransmissions-times and the number of attempts isn't enforced by the standard. It is implemented differently by different operating systems, but the methodology is fixed. (One of the ways to fingerprint OSs perhaps?)
The timeouts are measured in terms of the RTT (Round Trip Time) times. But this isn't needed very often due to Fast-retransmit which kicks in when 3 Duplicate ACKs are received.
Is there an upper bound on the number?
Yes there is. After a certain number of retries, the host is considered to be "down" and the sender gives up and tears down the TCP connection.
Is there functionality for the client to indicate to the server to forget about the whole TCP segment for which part went missing when the IP packet went missing?
The whole point is reliable communication. If you wanted the client to forget about some part, you wouldn't be using TCP in the first place. (UDP perhaps?)
There's no fixed time for retransmission. Simple implementations estimate the RTT (round-trip-time) and if no ACK to send data has been received in 2x that time then they re-send.
They then double the wait-time and re-send once more if again there is no reply. Rinse. Repeat.
More sophisticated systems make better estimates of how long it should take for the ACK as well as guesses about exactly which data has been lost.
The bottom-line is that there is no hard-and-fast rule about exactly when to retransmit. It's up to the implementation. All retransmissions are triggered solely by the sender based on lack of response from the receiver.
TCP never drops data so no, there is no way to indicate a server should forget about some segment.
In the case of a half open connection where the server crashes (no FIN or RESET sent to client), and the client attempts to send some data on this broken connection, each TCP segment will go un-ACKED. TCP will attempt to retransmit packets after some timeout. How many times will TCP attempt to retransmit before giving up and what happens in this case? How does it inform the operating system that the host is unreachable? Where is this specified in the TCP RFC?
If the server program crashes, the kernel will clean up all open sockets appropriately. (Well, appropriate from a TCP point of view; it might violate the application layer protocol, but applications should be prepared for this event.)
If the server kernel crashes and does not come back up, the number and timing of retries depends if the socket were connected yet or not:
tcp_retries1 (integer; default: 3; since Linux 2.2)
The number of times TCP will attempt to
retransmit a packet on an established connection
normally, without the extra effort of getting
the network layers involved. Once we exceed
this number of retransmits, we first have the
network layer update the route if possible
before each new retransmit. The default is the
RFC specified minimum of 3.
tcp_retries2 (integer; default: 15; since Linux 2.2)
The maximum number of times a TCP packet is
retransmitted in established state before giving
up. The default value is 15, which corresponds
to a duration of approximately between 13 to 30
minutes, depending on the retransmission
timeout. The RFC 1122 specified minimum limit
of 100 seconds is typically deemed too short.
(From tcp(7).)
If the server kernel crashes and does come back up, it won't know about any of the sockets, and will RST those follow-on packets, enabling failure much faster.
If any single-point-of-failure routers along the way crash, if they come back up quickly enough, the connection may continue working. This would require that firewalls and routers be stateless, or if they are stateful, have rulesets that allow preexisting connections to continue running. (Potentially unsafe, different firewall admins have different policies about this.)
The failures are returned to the program with errno set to ECONNRESET (at least for send(2)).