i have gRPC sync server with one service and 1 RPC.
I am not setting ResourceQuota on serverbuilder.
If n clients wants to connect, there will be n request handler threads created by gRPC. I want to keep some limit on these threads. lets say 10. And if it costs some latency in serving client, it is okay.
So I tried these settings:
grpc::ServerBuilder builder;
grpc::ResourceQuota rq;
rq.SetMaxThreads(10);
builder.SetResourceQuota(rq);
builder.SetSyncServerOption(
grpc::ServerBuilder::SyncServerOption::MIN_POLLERS, 1);
builder.SetSyncServerOption(
grpc::ServerBuilder::SyncServerOption::MAX_POLLERS, 1);
builder.SetSyncServerOption(grpc::ServerBuilder::SyncServerOption::NUM_CQS,
1);
From another process, I am firing up 800 clients in parallel. So I expect there will be 1 completion queue for each of them and 10 threads sharing it.
However, on client side there is an error:
"Server Threadpool Exhausted"
and none of the client succeeds. how to share threads between different clients.
Related
Question
What can cause tasks to be queued in Thread Pool while there are plenty threads still available in pool?
Explanation
Our actual code is too big to post, but here is best approximation:
long running loop
{
create Task 1
{
HTTP Post request (async)
Wait
}
create Task 2
{
HTTP Post request (async)
Wait
}
Wait for Tasks 1 & 2
}
The issue is that these HTTP requests which usually take 110-120 ms sometimes take up to 800-1100 ms.
Before you ask:
Verified no delays on server side
Verified no delays on network layer (tcpdump + wireshark). If we have such delays, there are pauses between requests, TCP level turn-around fits in 100ms
Important info:
We run it on Linux.
This happens only when we run the service in container on k8s or docker.
If we move it outside container it works just fine.
How do we know it's not ThreadPool starvation?
We have added logging values returned by ThreadPool.GetAvailableThreads and we have values of 32k and 4k for available threads.
How do we know the tasks are queued?
we run dotnet-counters tool and we see queue sizes up to 5 in same second when issue occurs.
Side notes:
we control the network, we are 99.999% sure it not it (because you can never be sure...)
process is not CPU throttled
the process usually have 25 - 30 threads in total at given time
when running on k8s/docker we tried both container and host network - no change.
HttpClient notes:
We are using this HTTP client: https://learn.microsoft.com/en-us/dotnet/api/system.net.http.httpclient?view=net-6.0
Client instances are created before we launch the loop.
These are HTTP, not HTTPS requests
URLs are always the same per task, server is given as IP, like this http://1.2.3.4/targetfortaskX
Generally - using tcpdump and wireshark we observe two TCP streams to be opened and living through whole execution and all requests made are assigned to one of these two streams with keep-alive. So no delays on DNS, TCP SYN or source port exhaustion.
I am trying to understand how gRPC queues are managed and if there are any size limitations on gRPC queue size.
According to this SO post requests are queued:
If your server already processing maximum_concurrent_rpcs number of requests concurrently, and yet another request is received, the request will be rejected immediately.
If the ThreadPoolExecutor's max_workers is less than maximum_concurrent_rpcs then after all the threads get busy processing requests, the next request will be queued and will be processed when a thread finishes its processing.
According to this GitHub post the queue is managed by the gRPC server:
So maximum_concurrent_rpcs gives you a way to set an upper bound on the number of RPCs waiting in the server's queue to be serviced by a thread.
But this Microsoft post cofused me, saying requests are queued on the client:
When the number of active calls reaches the connection stream limit, additional calls are queued in the client. Queued calls wait for active calls to complete before they are sent.
Pay attention though, that here Microsoft is talking about connection stream limit. When that limit is reached, a queue is formed on the client.
Are there 2 types of queues? One that is created on the server (gRPC queue) when some limits are met (as mentioned above), and another created on the client when this connection stream limit is reached.
And what is the size limit of a gRPC queue? I mean, it is limited only by the underlying hardware (RAM)?
Is there any chance we can get the server to fail because of a huge queue size? Is it possible to limit this queue size?
And if we are talking about 2 different queues, can we manage and limit the one on the client too?
I am especially interested in python's point of view.
Thanks!
P.S. I am assuming when people are talking about gRPC queues they are talking about a queue created on the server.
Say I have a webserivce used internally by other webservices with an average response time of 1 minute.
What are the pros and cons of such a service with "synchronous" responses versus making the service return id of the request, process it in the background and make the clients poll for results?
Is there any cons with HTTP connections which stay active for more than one minute? Does the default keep alive of TCP matters here?
Depending on your application it may matter. Couple of things worth mentioning are !
HTTP protocol is sync
There is very wide misconception that HTTP is async. Http is synchronous protocol but your client could deal it async. E.g. when you call any service using http, your http client may schedule is on the background thread (async). However The http call will be waiting until either it's timeout or response is back , during all this time the http call chain is awaiting synchronously.
Sockets
Since HTTP uses socket and there is hard limit on sockets. Every HTTP connection (if created new every time) opens up new socket . if you have hundreds of requests at a time you can image how many http calls are scheduled synchronously and you may run of sockets. Not sure for other operation system but on windows even if you are done with request sockets they are not disposed straight away and stay for couple of mins.
Network Connectivity
Keeping http connection alive for long is not recommended. What if you loose network partially or completely ? your http request would timeout and you won't know the status at all.
Keeping all these things in mind it's better to schedule long running tasks on background process.
If you keep the user waiting while your long job is running on server, you are tying up a valuable HTTP connection while waiting.
Best practice from RestFul point of view is to reply an HTTP 202 (Accepted) and return a response with the link to poll.
If you want to hang the client while waiting, you should set a request timeout at the client end.
If you've some Firewalls in between, that might drop connections if they are inactive for some time.
Higher Response Throughput
Typically, you would want your OLTP (Web Server) to respond quickly as possible, Since your queuing the task on the background, your web server can handle more requests which results to higher response throughput and processing capabilities.
More Memory Friendly
Queuing long running task on background jobs via messaging queues, prevents abusive usage of web server memory. This is good because it will increase the Out of memory threshold of your application.
More Resilient to Server Crash
If you queue task on the background and something goes wrong, the job can be queued to a dead-letter queue which helps you to ultimately fix problems and re-process the request that caused your unhandled exceptions.
Our front-end MVC3 web application is using AsyncController, because each of our instances is servicing many hundreds of long-running, IO bound processes.
Since Azure will terminate "inactive" http sessions after some pre-determined interval (which seems to vary depending up on what website you read), how can we keep the connections alive?
Our clients MUST stay connected, and our processes will run from 30 seconds to 5 minutes or more. How can we keep the client connected/alive? I initially thought of having a timeout on the Async method, and just hitting the Response object with a few bytes of output, sort of like chunking the response, and then going back and waiting some more. However, I don't think this will work, since MVC3 is handling the hookup of an IIS thread back to the asynchronous response, which will have already rendered a view at that time.
How can we run a really long process on an AsyncController, but have the client not be disconnected by the Azure Load Balancer? Sending an immediate response to the caller, and asking that caller to poll or check another resource URL is not acceptable.
Azure load balancer idle time-out is 4 minutes. Can you try to configure TCP keep-alive on the client side for less than 4 minutes, that should keep the connection alive?
On the other hand, it's pretty expensive to keep a connection open per client for a long time. This will limit the number of clients you can handle per server. Also, I think IIS may still decide to close a connection regardless of keep-alives if it thinks it need the connection to serve other requests.
The following article by Thomas Marquardt describes how IIS handles ASP.Net requests, the max/min CLR worker threads/Managed IO threads that can be configured to run, the various requests queues involved and their default sizes.
Now as per the article, the following occurs in IIS 6.0:
ASP.NET picks up the requests from a IIS IO Thread and posts "HSE_STATUS_PENDING" to IIS IO Thread
The requests is handed over to a CLR Worker thread
If the requests are of high latency and all the threads are occupied (the thread count approaches httpRuntime.minFreeThreads), then the requests are posted to the Application level request queue (this queue is per AppDomain)
Also ASP.NET checks the number of concurrently executing requests. The article states that "if the number of concurrently executing requests is too high" it queues the incoming requests to a ASP.NET global request queue (this is per Worker Process) (Please check Update 2)
I want to know what is the "threshold value" at which point ASP.NET considers that the number of requests currently executing it too high and then starts queuing the requests to the global ASP.NET request queue?
I think this threshold will depend upon the configuration of max number of worker threads, but there might be some formula based on which ASP.NET will determine that the number of concurrently executing requests is too high and starts queuing the requests to the ASP.NET global request queue. What might this formula? or is this setting configurable?
Update
I read through the article again and in the comments sections I found this:
1) On IIS 6 and in IIS 7 classic mode, each application (AppDomain)
has a queue that it uses to maintain the availability of worker
threads. The number of requests in this queue increases if the number
of available worker threads falls below the limit specified by
httpRuntime minFreeThreads. When the limit specified by
httpRuntime appRequestQueueLimit is exceeded, the request is
rejected with a 503 status code and the client receives an
HttpException with the message "Server too busy." There is also an
ASP.NET performance counter, "Requests In Application Queue", that
indicates how many requests are in the queue. Yes, the CLR thread
pool is the one exposed by the .NET ThreadPool class.
2) The requestQueueLimit is poorly named. It actually limits the
maximum number of requests that can be serviced by ASP.NET
concurrently. This includes both requests that are queued and
requests that are executing. If the "Requests Current" performance
counter exceeds requestQueueLimit, new incoming requests will be
rejected with a 503 status code.
So essentially requestQueueLimit limits the number of requests that are queued (I am assuming it will sum the number of requests queued in Application queues plus the global ASP.Net request queue plus the number of requests currently executing) and are executing. All though this does not answer the original question, it does provide information about when we might receive a 503 server busy error because of high number of concurrent requests/high latency requests.
(Check update 2)
Update 2
There was a mistake in my part in the understanding. I had mixed up the descriptions for IIS 6 and IIS 7.
Essentially when ASP.NET is hosted on IIS 7.5 and 7.0 in integrated mode, the application-level queues are no longer present, ASP.NET maintains a global request queue.
So IIS 7/7.5 will start to queue requests to the global request queue if the number of executing requests is deemed high. The question applies more to IIS 7/7.5 rather than 6.
As far IIS 6.0 is concerned, there is no global ASP.NET request queue, but the following is true:
1. ASP.NET picks up the requests from a IIS IO Thread and posts "HSE_STATUS_PENDING" to IIS IO Thread
2. The requests is handed over to a CLR Worker thread
3. If the requests are of high latency and all the threads are occupied (the thread count approaches httpRuntime.minFreeThreads), then the requests are posted to the Application level request queue (this queue is per AppDomain)
4. Also ASP.NET checks the number of requests queued and currently executing before accepting a new request. If this number is greater than value specified by processModel.requestQueueLimit then incoming requests will be rejected with 503 server busy error.
This article might help to understand the settings a little better.
minFreeThreads: This setting is used by the worker process to queue all the incoming requests if the number of available threads in
the thread pool falls below the value for this setting. This setting
effectively limits the number of requests that can run concurrently to
maxWorkerThreads minFreeThreads. Set minFreeThreads to 88 * # of
CPUs. This limits the number of concurrent requests to 12 (assuming
maxWorkerThreads is 100).
Edit:
In this SO post, Thomas provides more detail and examples of request handling in the integrated pipeline. Be sure to read the comments on the answer for additional explanations.
A native callback (in webengine.dll) picks up request on CLR worker
thread, we compare maxConcurrentRequestsPerCPU * CPUCount to total
active requests. If we've exceeded limit, request is inserted in
global queue (native code). Otherwise, it will be executed. If it was
queued, it will be dequeued when one of the active requests completes.