We have written a gRPC Async server/client to parallelize certain processing. I have a fundamental question on how the Async server works.
For example, say, we make two same Async calls from the client on a method but giving different inputs (half the amount of the original data) to each call so that we can parallelize this operation. What goes on, on the server side when we do this?
My assumption was that the server would queue up these two calls and then handle them simultaneously in two separate threads. But, this does not appear to be so.
The time taken to process one single call with all the original data turns out to be less than the combined time taken by the two Async calls of the same method with half the original data for each. Why is this so?
Thanks!
This almost isn't about gRPC but about machines distributed across networks and how a single machine (appears to) process multiple tasks (concurrently).
Perhaps a thought experiment (see below) will help explain some of the differences. In neither your question nor the thought experiment do we consider (the realistic scenario) that there is network latency, connection management time, and other processes underway.
With a single message, once the server has received it (and assuming it has negligible other demands on its time), it can process the data and return the result (ignoring that gRPC messages must be unmarshaled from the wire into protos and possible into different types for processing and then back again for transmission etc.).
With multiple messages, receipt of each interrupts the server, must be unmarshaled etc, processed, marshaled, transmitted etc.
If there were no other processes underway, no network latency, no connection setup|teardown etc., no (un)marshaling time, no processing overheads (e.g. getting a database connection for each message), no concurrent processing (messages arrive perfectly such that as one is completed, the next arrives), even then (!!) there are non-negligible processing overheads.
Single call
Your colleague (the client) emails you (the server) the data.
Upon receipt of the data, you process it and email back the results.
Multiple calls
You colleague splits the data into multiple emails to you.
Upon receipt of each email (!), you begin processing the data.
If another email arrives before you're done, you begin processing it too.
Give yourself a couple of seconds on each document but switch between them.
When you're done with each, email back the result to your colleague.
Additionally:
before sending you the email, your colleague goes to lunch
while processing at least one job, your dog needs to be taken for a walk.
Related
I currently have a game server with a customizable tick rate, but for this example let's suggest that the server is only going to tick once per second or 1hz. I'm wondering what's the best way to handle incoming packets if the client send rate is faster than the server's as my current setup doesn't seem to work.
I have my udp blocking receive with a timeout inside my tick function, and it works, however if the client tick rate is higher than the server, all of the packets are not received; only the one that is being read at the current time. So essentially the server is missing packets being sent by clients. The image below demonstrates my issue.
So my question is, how is this done correctly? Is there a separate thread where packets are read constantly, queued up and then the queue is processed when the server ticks or is there a better way?
Image was taken from a video https://www.youtube.com/watch?v=KA43TocEAWs&t=7s but demonstrates exactly what I'm explaining
There's a bunch going on with the scenario you describe, but here's my guidance.
If you are running a server at 1hz, having a blocking socket prevent your main logic loop from running is not a great idea. There is a chance you won't be receiving messages at the rate you expect (due to packet loss, network lag spike or client app closing)
A) you certainly could create another thread, continue to make blocking recv/recvfrom calls, and then enqueue them onto a thread safe data structure
B) you could also just use a non-blocking socket, and keep reading packets until it returns -1. The OS will buffer (usually configurable) a certain number of incoming messages, until it starts dropping them if you aren't reading.
Either way is fine, however for individual game clients, I prefer the second simple approaches when knowing I'm on a thread that is servicing the socket at a reasonable rate (5hz seems pretty low, but may be appropriate for your game). If there's a chance you are stalling the servicing thread (level loading, etc), then go with the first approach, so you don't detect it as a disconnection if you miss sending/receiving a periodic keepalive message.
On the server side, if I'm planning on a large number of clients/data, I go to great lengths to efficiently read from sockets - using IO Completion Ports on Windows, or epoll() on Linux.
Your server could have to have a thread to tick every 5 seconds just like the client to receive all the packets. Anything not received during that tick would be dropped as the server was not listening for it. You can then pass the data over from the thread after 5 ticks to the server as one chunk. The more reliable option though is to set the server to 5hz just like the client and thread every packet that comes in from the client so that it does not lock up the main thread.
For example, if the client update rate is 20, and the server tick rate is 64, the client might as well be playing on a 20 tick server.
I've written a small program with the boost asio library to transfer files via TCP from a server to one or more clients.
During testing I found out that the transfer is extremely slow, about 10KiB/s. Nagle's algorithm is already disabled. If I transfer the same file via FileZilla from the same server to the same client, I get about 280KiB/s, so obviously something was very wrong.
My approach so far was to fragment each file into smaller packets of 1024 bytes, send one fragment(each fragment=1 async_write-call) to the client and wait for the client's response. I need to fragment the data to allow the client to keep track of the download progress and speed. In retrospect I suppose this was rather naïve, because the server has to wait for the client's response after each fragment. To check if this was the bottleneck, I've increased the fragment size twice, giving me the following results:
a) Fragment Size: 1024bytes
Transfer Speed: ~10KiB/s
b) Fragment Size: 8192bytes
Transfer Speed: ~80KiB/s
c) Fragment Size: 20000bytes
Transfer Speed: ~195KiB/s
The results speak for themselves, but I'm unsure what to do now.
I'm not too familiar with how the data transfer is actually handled internally, but if I'm not mistaken all of my data is basically added onto a stream? If that's the case, do I need to worry about how much data I write to that stream at once? Does it make a difference at all whether I use multiple write-calls with small fragments as opposed to one write-call with a large fragment? Are there any guidelines for this?
Simply stream the data to the client without artificial packetization. Reenable nagling, this is not a scenario that calls for disabling it. It will cause small inefficiencies to have it disabled.
Typical write buffer sizes would be 4KB and above.
The client can issue read calls to the network one after the other. After each successful read the client will have a new estimation for the current progress that is quite accurate. Typically, there will be one succeeding read call for each network packet received. If the incoming rate is very high then multiple packets tend to be coalesced into one read. That's not ap roblem.
If that's the case, do I need to worry about how much data I write to that stream at once?
No. Just keep a write call outstanding at all times.
I am using a PersistentConnection for publishing large amounts of data (many small packages) to the connected clients.
It is basically a one way direction of data (since each client will call endpoints on other servers to set up various subscriptions, so they will not push any data back to the server via the SignalR connection).
Is there any way to detect that the client cannot keep up with the messages sent to it?
The example could be a mobile client on a poor connection (e.g. in a roaming situation, the speed may vary a lot). If we are sending 100 messages per second, but the client can only handle 10, we will eventually lose the messages (due to the message buffer on the server side).
I was looking for a server side event, similar to what has been done on the (SignalR) client, e.g.
protected override Task OnConnectionSlow(IRequest request, string connectionId) {}
but that is not part of the framework (for good reasons, I assume).
I have considered using the approach (suggested elsewhere on Stackoverflow), to let the client tell the server (e.g. every 10-30 seconds) how many messages it has received, and if that number differentiates a lot from the number of messages sent to the client, it is likely that the client cannot keep up.
The event would be used to tell the distributed backend that the client cannot keep up, and then turn down the data generation rate.
There's no way to to this right now other than coding something custom. We have discussed this in the past as a potential feature but it isn't anywhere the roadmap right now. It's also not clear what "slow" means as it's up to the application to decide. There'd probably be some kind of bandwidth/time/message based setting that would make this hypothetical event trigger.
If you want to hook in at a really low level, you could use owin middleware to replace the client's underlying stream with one that you owned so that you'd see all of the data going over the write (you'd have to do the same for websockets though and that might be non trivial).
Once you have that, you could write some time based logic that determined if the flush was taking too long and kill the client that way.
That's very fuzzy but it's basically a brain dump of how a feature like this could work.
When we talk about capacity of a web application, we often mention the concurrent requests it could handle.
As my another question discussed, Ethernet use TDM (Time Division Multiplexing) and no 2 signals could pass along the wire simultaneously. So if the web server is connected to the outside world through a Ethernet connection, there'll be literally no concurrent requests at all. All requests will come in one after another.
But if the web server is connected to the outside world through something like a wireless network card, I believe the multiple signals could arrive at the same time through the electro-magnetic wave. Only in this situation, there are real concurrent requests to talk about.
Am I right on this?
Thanks.
I imagine "concurrent requests" for a web application doesn't get down to the link level. It's more a question of the processing of a request by the application and how many requests arrive during that processing.
For example, if a request takes on average 2 seconds to fulfill (from receiving it at the web server to processing it through the application to sending back the response) then it could need to handle a lot of concurrent requests if it gets many requests per second.
The requests need to overlap and be handled concurrently, otherwise the queue of requests would just fill up indefinitely. This may seem like common sense, but for a lot of web applications it's a real concern because the flood of requests can bog down a resource for the application, such as a database. Thus, if the application has poor database interactions (overly complex procedures, poor indexing/optimization, a slow link to a database shared by many other applications, etc.) then that creates a bottleneck which limits the number of concurrent requests the application can handle, even though the application itself should be able to handle them.
.Imagining a http server listening at port 80, what happens is:
a client connects to the server to request some page; it is connecting from some origin IP address, using some origin local port.
the OS (actually the network stack) looks at the incoming request's destination IP (since the server may have more than one NIC) and destination port (80) and verifies that some application is registered to handle data on that port (the http server). The combination of 4 numbers (origin IP, origin port, destination IP, port 80) uniquely identifies a connection. If such a connection does not exists yet, a new one is added to the network stack's internal table and a connection request is passed on to the http server's listening socket. From now on the network stack just passes on data for that connection to the application.
Multiple client can be sending requests, for each one the above happens. So from the network perspective, all happens serially, since data arrives one packet at a time.
From the software perspective, the http server is listening to incoming requests. The number of requests it can have queued before the clients start getting errors is determined by the programmer based on the hardware capacity (this is the first bit of concurrency: there can be multiple requests waiting to be processed). For each one it will create a new socket (as fast as possible in order to continue emptying the request queue) and let the actual processing of the request be done by another part of the application (different threads). These processing routines will (ideally) spend most of their time waiting for data to arrive and react (ideally) quickly to it.
Since usually the processing of data is many times faster than the network I/O, the server can handle many requests while processing network traffic, even if the hardware consist of only one processor. Multiple processors increase this capability. So from the software perspective all happens concurrently.
How the actual processing of the data is implemented is where the key to performance lies (you want it to be as efficient as possible). Several possibilities exist (async socket operations as provided by the Socket class, threadpool, unique threads, the new parallel features from .NET 4).
It's true that no two packets can arrive at the exact same time (unless multiple network cards are in use per Gabe's comment). However, web request usually requires a number of packets. The arrival of these packages is interspersed when multiple requests are coming in at near the same time (whether using wired or wireless access). Also, the processing of these requests can overlap.
Add multi-threading (or multiple processors / cores) to the picture, and you can see how lengthy operations such as reading from a database (which requires a lot of waiting around for a response) can easily overlap even though the individual packets are arriving in a serial fashion.
Edit: Added note above to incorporate Gabe's feedback.
I am writing a 2D multiplayer game consisting of two applications, a console server and windowed client. So far, the client has a FD_SET which is filled with connected clients, a list of my game object pointers and some other things. In the main(), I initialize listening on a socket and create three threads, one for accepting incoming connections and placing them within the FD_SET, another one for processing objects' location, velocity and acceleration and flagging them (if needed) as the ones that have to be updated on the client. The third thread uses the send() function to send update info of every object (iterating through the list of object pointers). Such a packet consists of an operation code, packet size & the actual data. On the client I parse it, by reading first 5 bytes (the opcode and packet size) which are received correctly, but when I want to read the remaining part of the packet (since I now know the size of it), I get a WSAECONNABORTED (error code 10053). I've read about this error, but can't see why it occurs in my application. Any help would be appreciated.
The error means the system closed the socket. This could be because it detected that the client disconnected, or because it was sending more data than you were reading.
A parser for network protocols typcally needs a lot of work to make it robust, and you can't tell how much data you will get in a single read(), e.g. you may get more than your operation code and packet size in the first chunk you read, you might even get less (e.g. only the operation code). Double check this isn't happening in your failure case.