The gRPC Protocol Spec specifies that the grpc-timeout header uses relative values (eg: 100m). However, gRPC prefers absolute deadlines and I'm struggling to understand how this absolute deadline is communicated over the wire.
for eg: Lets say that the one-way geographic latency between the client and server is 100ms. Let's use a linear scale for time in millis for simplicity.
If a client makes a requests at time 0 with "grpc-timeout: 300m" Header then the server will receive the request at time 100. How does the server recognize that it only has 200ms remaining to process the request?
How does the server recognize that it only has 200ms remaining to process the request?
It doesn't. Network latency is not taken into account. And understand in your example that the server actually only has 100ms to process, because it will take 100ms for the response to be received. The round-trip time (RTT) is 200ms. Note that RTTs are dynamic (they change over time) and the network links may not be symmetric (it might be 80ms in one direction and 120ms in the other).
There's no guarantee the client and server have their clocks synchronized. So there is no "absolute" time for gRPC to rely on when communicating between them. There's also no guarantee that the clocks progress at the same rate; one clock may be slower than the other (aka clock drift). Using a timeout on-the-wire generally covers up any issues caused by clock drift.
Communicating a timeout on-the-wire but deadlines within-a-process is simple and effective. Trying to be 100% precise about the deadline requires supporting assumptions or systems. In practice, the RTT shouldn't be a substantial portion of the time remaining before the deadline as that does not provide enough time for lost packets and retransmits (ignoring that gRPC might even need to establish a TCP connection). If the deadline is aggressive compared to the RTT, then the server computing results that went unused would just need to be accepted or requires vastly different networking designs and tradeoffs.
Related
I currently have a game server with a customizable tick rate, but for this example let's suggest that the server is only going to tick once per second or 1hz. I'm wondering what's the best way to handle incoming packets if the client send rate is faster than the server's as my current setup doesn't seem to work.
I have my udp blocking receive with a timeout inside my tick function, and it works, however if the client tick rate is higher than the server, all of the packets are not received; only the one that is being read at the current time. So essentially the server is missing packets being sent by clients. The image below demonstrates my issue.
So my question is, how is this done correctly? Is there a separate thread where packets are read constantly, queued up and then the queue is processed when the server ticks or is there a better way?
Image was taken from a video https://www.youtube.com/watch?v=KA43TocEAWs&t=7s but demonstrates exactly what I'm explaining
There's a bunch going on with the scenario you describe, but here's my guidance.
If you are running a server at 1hz, having a blocking socket prevent your main logic loop from running is not a great idea. There is a chance you won't be receiving messages at the rate you expect (due to packet loss, network lag spike or client app closing)
A) you certainly could create another thread, continue to make blocking recv/recvfrom calls, and then enqueue them onto a thread safe data structure
B) you could also just use a non-blocking socket, and keep reading packets until it returns -1. The OS will buffer (usually configurable) a certain number of incoming messages, until it starts dropping them if you aren't reading.
Either way is fine, however for individual game clients, I prefer the second simple approaches when knowing I'm on a thread that is servicing the socket at a reasonable rate (5hz seems pretty low, but may be appropriate for your game). If there's a chance you are stalling the servicing thread (level loading, etc), then go with the first approach, so you don't detect it as a disconnection if you miss sending/receiving a periodic keepalive message.
On the server side, if I'm planning on a large number of clients/data, I go to great lengths to efficiently read from sockets - using IO Completion Ports on Windows, or epoll() on Linux.
Your server could have to have a thread to tick every 5 seconds just like the client to receive all the packets. Anything not received during that tick would be dropped as the server was not listening for it. You can then pass the data over from the thread after 5 ticks to the server as one chunk. The more reliable option though is to set the server to 5hz just like the client and thread every packet that comes in from the client so that it does not lock up the main thread.
For example, if the client update rate is 20, and the server tick rate is 64, the client might as well be playing on a 20 tick server.
In our C/S based online game project, we use TCP for network transmission. We include Libevent, utilise a bufferevent for each connection to handling with the network I/O automatically.
It works well before,but the lagging problem comes to the surface recently. When I do some stress testing to make the network busier, the latency becomes extremely high, several seconds or more. The server sinks into a confusing state:
the average CPU usage decreased (0%-60%-0%-60% repeat, waiting something?)
the net traffic decreased (nethogs)
the clients connected to server still alive (netstat & tcpdump)
It looks like something magically slowed all system down, but new connection to server responded quit in time.
When I changed the protocol to UDP, it works well on the same situation: no obvious latency, the system runs fast. Net traffic is around 3M/S.
The project is running on an Intranet. I also tested the max download speed, nearly 18M/S.
I studied part of Libevent's header files and ducumentations, tried to setup a rate limit to all connections. It did some improvements, but not completely resolved the problem even though I had tried several different configurations. Here is my parameters: read_rate 163840, read_burst 163840, write_rate 163840, write_burst 163840, tick_len 500ms.
Thank you for your help!
TCP = Transmission Control Protocol. It responds to packet loss by retransmitting unacknowledged packets after a delay. In the case of repeated loss, it will exponentially back off. Take a look at this network capture of an attempt to open a connection to host that is not responding:
It sends the initial SYN, and then after not getting an ack for 1s it tries again. After not getting an ack it then sends another after ~2s, then ~4s, then ~8s, and so on. So you can see that you can get some serious latency in the face of repeated packet loss.
Since you said you were deliberately stressing the network, and that the CPU usage is inconsistent, one possible explanation is that TCP is waiting to retransmit lost packets.
The best way to see what is going on is to get a network capture of what is actually transmitted. If your hosts are connected to a single switch, you can "span" a port of interest to the port of another host where you can make the capture.
If your switch isn't capable of this, or if you don't have the administrative control of the switch, then you will have to get the capture from one of hosts involved in your online game. The disadvantage of this is that taking the capture will possibly alter what happens, and it doesn't see what is actually on the wire. For example, you might have TCP segmentation offload enabled for your interface, in which case the capture will see large packets that will be broken up by the network interface.
I would suggest installing wireshark to analyse the network capture (which you can do in real time by using wireshark to do the capture as well). Any time you are working with a networked system I would recommend using wireshark so that you have some visibility into what is actually happening on the network. The first filter I would suggest you use is tcp.analysis.flags which will show you packets suggestive of problems.
I would also suggest turning off the rate limiting first to try to see what is going on (rate limiting is adding another reason to not send packets, which is probably going to make it harder to diagnose what is going on). Also, 500ms might be a longish tick_len depending on how your game operates. If your burst configuration allows the rate to be used up in 100ms, you will end up waiting 400ms before you can transmit again. The IO Graph is a very helpful feature of Wireshark in this regard. It can help you see transmission rates, although the default tick interval and unit are not very helpful in this regard. Here is an example of a bursty flow being rate limited to 200mbit/s:
Note that the tick interval is 1ms and the unit is bits/tick, which makes the top of the chart 1gb/s, the speed of the interface in question.
So the two general problem states that there is no deterministic way of knowing if the other party - to whom we communicate via a unreliable channel - has received our messages. This is quite analogous to the TCP handshake where we send a syn syn ack ack and establish a connection. Isn't this opposing the two general problem claim?
The two general problem is indeed the asynchronous model for TCP, which is why (as the theoretical result shows) the two endpoints cannot simultaneously have common knowledge about the state of the connection.
The way every distributed agreement protocol deals with this issue is to always promise safety (nothing bad will happen), but cannot guarantee liveness (that progress will eventually be made). Liveness is not in your hands. In good times, one can try to do ones best and hope to make progress.
In TCP it means that an endpoint can make an assumption (such as "connection established") without definitely knowing the other's state. However, it is not an unsafe assumption to make; at worst, it is a benign misunderstanding. After a timeout, it will change its opinion. It is no different from being on one end of a long-distance telephone and continuing to talk thinking the connection is still on; after a while, you may have to ask "hello, you still there?", and time out. Real world protocols must always have timeouts (unlike asynchronous formal models) because somewhere up the stack they serve some human function, and human patience is limited. In practice, there are sufficiently good stretches of time that progress can be made, so we just have to pick appropriate timeouts that don't time out too early either.
That said, even benign misunderstandings can have undesirable consequences. For example, after a server responds to the syn, it allocates resources for the connection in the hope that the client will finish the protocol. This is a classic denial of service attack because a rogue client can simply start off the handshake sequence but never finish it, leaving an unprepared server with millions of state machines allocated. Care is required.
As far as I know, the only reason to wait for a ACK has to do with the transmit window getting exhausted. Or maybe slow-start. But then this fragment of a Wireshark dump over a pre-existing TCP socket doesn't make sense to me:
Here, between the packets 38 and 40, the server (45.55.162.253) waits a full RTT before continuing sending. I changed the RTT through Netem to be sure that delay is alway equal to the RTT, and as you can see, there is no application data flowing from client to server that the server might need "to continue working". But there is a very conspicuous ACK packet going from the client (packet 39) without any payload. The advertised window is a lot larger than [SEQ/ACK analysis]/[Bytes in flight], which is 1230.
My question is: is there something in TCP that is triggering this wait for ACK between packet 38 and 40 by the server?
TCP limits its transmission rate according to two separate mechanisms:
Flow Control, which is there to make sure that the sender doesn't overwhelm the other party with data. This is where the receive window comes in. Since the receive windows advertised by the client in your screenshot are large, this isn't what pauses the transfer in your case.
Congestion Control, which tries to make sure that the network isn't overwhelmed. Slow Start, which you've mentioned, is part of this mechanism in some implementations of TCP, specifically TCP Tahoe and TCP Reno, which are the variants most commonly taught in networking courses although rarely used in practice.
Since we know that flow control is not what's pausing the connection, we can assume that the culprit is the congestion control algorithm. To figure out the exact cause however, you'd need to dive into the implementation details of TCP your OS uses. For windows, it seems to be something called Compound TCP. With recent Linux kernels, it's something called TCP CUBIC, described in this whitepaper.
The important thing to note however is that both mechanisms operate during the entire lifetime of the connection, not just its start. It seems that your sender paused after sending its biggest packet so far (at least among the ones shown in the screenshot), so it possible that this packet consumed its remaining free congestion control window, and although the flow control window was still large, it was bound the former.
Because of geographic distance between server and client network latency can vary a lot. So I want to get "pure" req. processing time of service without network latency.
I want to get network latency as TCP connecting time. As far as I understand this time depends a lot on network.
Main idea is to compute:
TCP connecting time,
TCP first packet receive time,
Get "pure" service time = TCP first packet receive (waiting time) – TCP connecting.
I divide TCP connecting by 2 because in fact there are 2 requests-response (3-way handshake).
I have two questions:
Should I compute TCP all packets receive time instead of only first packet?
Is this method okay in general?
PS: As a tool I use Erlang's gen_tcp. I can show the code.
If at all, i guess the "pure" service time = TCP first packet receive - TCP connecting.. You have written other way round.
A possible answer to your first questions is , you should ideally compute atleast some sort of average by considering pure service time of many packets rather than just first packet.
Ideally it can also have worst case, average case, best case service times.
For second question to be answered we would need why would you need pure service time only. I mean since it is a network application, network latencies(connection time etc...) should also be included in the "response time", not just pure service time. That is my view based on given information.
I have worked on a similar question when working for a network performance monitoring vendor in the past.
IMHO, there are a certain number of questions to be asked before proceeding:
connection time and latency: if you base your network latency metric, beware that it takes into account 3 steps: Client sends a TCP/SYN, Server responds with a TCP/SYN-ACK, the Client responds by a final ACK to set up the TCP connection. This means that the CT is equivalent to 1.5 RTT (round trip time). This validates taking the first two steps of the TCP setup process in acccount like you mention.
Taking in account later TCP exchanges: while this first sounds like a great idea to keep evaluating network latency in the course of the session, this becomes a lot more tricky. Here is why: 1. Not all packets have to be acknowledged (RFC1122 or https://en.wikipedia.org/wiki/TCP_delayed_acknowledgment) , which will generate false measurements when it occurs, so you will need an heuristic to take these off your calculations. 2. Not all systems consider acknowledging packets a high priority tasks. This means that some high values will pollute your network latency data and simply reflect the level of load of the server for example.
So if you use only the first (and reliable) measurement you may miss some network delay variation (especially in apps using long lasting TCP sessions).