Our web servers run a Python app behind nginx + uwsgi.
Sometimes we have short spikes (2-5x avg no requests) for a second resulting in some requests getting a 502 if there are no workers available to handle them.
Is there a way for nginx or uwsgi to queue these requests up and serve them when workers become available?
It's better with a short increase in response time rather than getting an error ;-)
Related
Question
What can cause tasks to be queued in Thread Pool while there are plenty threads still available in pool?
Explanation
Our actual code is too big to post, but here is best approximation:
long running loop
{
create Task 1
{
HTTP Post request (async)
Wait
}
create Task 2
{
HTTP Post request (async)
Wait
}
Wait for Tasks 1 & 2
}
The issue is that these HTTP requests which usually take 110-120 ms sometimes take up to 800-1100 ms.
Before you ask:
Verified no delays on server side
Verified no delays on network layer (tcpdump + wireshark). If we have such delays, there are pauses between requests, TCP level turn-around fits in 100ms
Important info:
We run it on Linux.
This happens only when we run the service in container on k8s or docker.
If we move it outside container it works just fine.
How do we know it's not ThreadPool starvation?
We have added logging values returned by ThreadPool.GetAvailableThreads and we have values of 32k and 4k for available threads.
How do we know the tasks are queued?
we run dotnet-counters tool and we see queue sizes up to 5 in same second when issue occurs.
Side notes:
we control the network, we are 99.999% sure it not it (because you can never be sure...)
process is not CPU throttled
the process usually have 25 - 30 threads in total at given time
when running on k8s/docker we tried both container and host network - no change.
HttpClient notes:
We are using this HTTP client: https://learn.microsoft.com/en-us/dotnet/api/system.net.http.httpclient?view=net-6.0
Client instances are created before we launch the loop.
These are HTTP, not HTTPS requests
URLs are always the same per task, server is given as IP, like this http://1.2.3.4/targetfortaskX
Generally - using tcpdump and wireshark we observe two TCP streams to be opened and living through whole execution and all requests made are assigned to one of these two streams with keep-alive. So no delays on DNS, TCP SYN or source port exhaustion.
We have an ASP.NET application running on IIS on Windows Server 2016, hosted on a D48s Azure virtual machine (48 cores).
Most of the time, the web app is processing requests from regular users at a pace of 200-300 requests per seconds.
But at times, the server receives high incoming traffic as webhooks from external sites – up to 1000-5000 requests per second.
When this happens, the CPU usage get really high, and we noticed that the Local Security Authority Process (lsass.exe) is consuming most of the CPU at that time: it can take up to 40-50% of the total CPU usage (the orange graph is lsass.exe CPU usage):
Needless to say, the server becomes really busy, and other requests start to slow down.
By the way, each webhook request is very lightweight and adds a record to a table in SQL Server, for later processing.
We made a load test on our server, and found out that lsass becomes so active only when requests are made using https. However, if the very same requests are made using http, lsass.exe is not active at all.
The second discovery was that when using http the server was able to process 4-5 times more requests under the same load, compared to https!
Here is a screenshot from Performance Monitor: the green line show Requests per seconds, and the brown line shows lsass.exe CPU usage. On the left is what happens when using http, and on the right – when using https:
So the bottom line is that:
https makes the requests 4-5 times slower.
When using https, lsass.exe starts to eat a lot of CPU resources.
Questions are:
Why is lsass.exe so active during https requests?
I found an article on the web saying that lsass.exe was used by IIS 6.0 to cipher / decipher https traffic, but that starting from IIS 7.0 it is no longer used. However, our experiments indicate the contrary.
I don't understand how 4000-5000 requests per seconds having a body of 3-5 kilobytes can make the the CPU of a 48-core server so busy.
Maybe there are some hidden SSL settings in IIS that can make https more efficient?
We found info on SSL offloading: can this be done on Azure?
Update
We created a new Azure VM from scratch (a 32-core D32s), installed IIS and created a simple ASP.NET Web Forms app that has only 1 "Hello World" aspx page that does nothing (no SQL Server requests etc.)
With JMeter we created a load test to this page, and the same pattern appeared here:
1) http: the server was processing 20 000 requests per second, and lsass.exe was not active.
2) https: the server was processing only 1000-1500 requests per seconds, and lsass.exe was consuming 10% of total CPU.
Here is the Performance Minitor graph (http on the left, https on the right):
By the way, JMeter and the ASP.NET web app were run from the same VM, so network round-trips were minimal.
How can https be 15-20 times slower than http in this simple situation?
And what is the role of lsass.exe in this situation?
Say I have a webserivce used internally by other webservices with an average response time of 1 minute.
What are the pros and cons of such a service with "synchronous" responses versus making the service return id of the request, process it in the background and make the clients poll for results?
Is there any cons with HTTP connections which stay active for more than one minute? Does the default keep alive of TCP matters here?
Depending on your application it may matter. Couple of things worth mentioning are !
HTTP protocol is sync
There is very wide misconception that HTTP is async. Http is synchronous protocol but your client could deal it async. E.g. when you call any service using http, your http client may schedule is on the background thread (async). However The http call will be waiting until either it's timeout or response is back , during all this time the http call chain is awaiting synchronously.
Sockets
Since HTTP uses socket and there is hard limit on sockets. Every HTTP connection (if created new every time) opens up new socket . if you have hundreds of requests at a time you can image how many http calls are scheduled synchronously and you may run of sockets. Not sure for other operation system but on windows even if you are done with request sockets they are not disposed straight away and stay for couple of mins.
Network Connectivity
Keeping http connection alive for long is not recommended. What if you loose network partially or completely ? your http request would timeout and you won't know the status at all.
Keeping all these things in mind it's better to schedule long running tasks on background process.
If you keep the user waiting while your long job is running on server, you are tying up a valuable HTTP connection while waiting.
Best practice from RestFul point of view is to reply an HTTP 202 (Accepted) and return a response with the link to poll.
If you want to hang the client while waiting, you should set a request timeout at the client end.
If you've some Firewalls in between, that might drop connections if they are inactive for some time.
Higher Response Throughput
Typically, you would want your OLTP (Web Server) to respond quickly as possible, Since your queuing the task on the background, your web server can handle more requests which results to higher response throughput and processing capabilities.
More Memory Friendly
Queuing long running task on background jobs via messaging queues, prevents abusive usage of web server memory. This is good because it will increase the Out of memory threshold of your application.
More Resilient to Server Crash
If you queue task on the background and something goes wrong, the job can be queued to a dead-letter queue which helps you to ultimately fix problems and re-process the request that caused your unhandled exceptions.
I am trying to serve long running requests using gunicorn and its async workers but I can't find any examples that I can get to work. I used the example here but tweaked to add a fake delay (sleep for 5s) before returning the response:
def app(environ, start_response):
data = "Hello, World!\n"
start_response("200 OK", [
("Content-Type", "text/plain"),
("Content-Length", str(len(data)))
])
time.sleep(5)
return iter([data])
Then I run gunicorn so:
gunicorn -w 4 myapp:app -k gevent
When I open up two browser tabs and type in http://127.0.0.1:8000/ in both of them and send the requests almost at the same time, the requests appear to get processed sequentially - one returns after 5 seconds and the other returns after a further 5 seconds.
Q. I am guessing the sleep isn't gevent friendly? But there are 4 workers and so even if the type of worker was 'sync' two workers should handle two requests simultaneously?
I just ran into the same thing, opened a question here: Requests not being distributed across gunicorn workers . The result is, it appears that the browser serializes access to the same page. I'm guessing perhaps this has something to do w/ cacheability, i.e. the browser thinks it's likely the page is cacheable, wait until it loads finds out it isn't so it makes another request and so on.
Give gevent.sleep a shot instead of time.sleep.
It's weird that this is happening with -w 4, but -k gevent is an async worker type, so it's possible gunicorn is feeding both requests to the same client. Assuming that's what's happening, time.sleep will lock your process unless you use gevent.monkey.patch_all().
When using gunicorn with non-blocking worker type, like gevent, It will use ONLY ONE process dealing with requests, so it's no surprise that your 5-second work carried out sequentially.
The async worker is useful when your workload is light, and request rate is rapid, in that case, gunicorn can utilize times wasted on waiting IO (like, waiting for socket to be writable to write the response to it), by switching to another worker to work another request. by switching to another request assigned to the same worker.
UPDATE
I was wrong.
When using gunicorn with non-blocking worker type, with worker settings in gunicorn, each worker is a process, that runs a separate queue.
So if the time.sleep was ran on different process, it will run simultaneously, but when it's ran in the same worker, it will be carried out sequentially.
The problem is that the gunicorn loadbalancer may not have distributed the two requests into two worker processes. You can check the current process by os.getpid().
I have a server application which runs in the Amazon EC2 cloud. From my client (the browser) I make a HTTP request which uploads a file to the server which then processes the file. If there is a lot of processing (large file
), the server always times out with a 504 backend continuation error always exactly after 120 seconds. Though I get this error, the server continues to process the request and completes it (verified by checking the database) but I cannot see the final result on my client because of the timeout.
I am clueless as to why this is happening. Has anyone faced a similar 504 timeout ? Is there some intermediate proxy server not in my control which is timing out ?
I have a similar problem and in my case I believe it is due to the connection between the Elastic Load Balancer (ELB) and the EC2 instance.
For a long-term solution I will go with the 303 Status response + back-end processing suggested by james.garriss above.
For short-term solution it may be possible for Amazon support to increase the ELB timeout (see their response in https://forums.aws.amazon.com/thread.jspa?messageID=491594񸁊). Unfortunately there doesn't seem to be any way to change the timeout yourself through either API or console.
[Update] AWS now does allow you to update the idle timeout either through console, CLI or .ebextensions configuration. See http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/config-idle-timeout.html (thanks #Daniel Patz for the update)
Assuming that the correct status code is being returned, the problem is that an intermediate proxy is timing out. "The server, while acting as a gateway or proxy, did not receive a timely response from the upstream server specified by the URI." (http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.5.5) It most likely indicates that the origin server is having some sort of issue (i.e., taking a long time to process your request), so it's not responding quickly.
Perhaps the best solution is to re-craft your server app so that it responds with a "303 See Other" status code; then your client can retrieve the data at a later data point, once the server is done processing and creates the final result.
Edit: Another idea is to re-craft your server app so that it responds with a "413 Request Entity Too Large" status code when the request entity size is too large. This will get rid of the error, though it may make your app less useful if it can only process "small" files."
Other possible solutions:
Increase timeout value of the proxy (if it's under your control)
Make your request to a different server (if there's another, faster server with the same app)
Make your request differently (if possible) such that you are sending less data at a time
it is possible that the browser timeouts during the script execution.