gunicorn doesn't process simultaneous requests concurrently

gunicorn doesn't process simultaneous requests concurrently - asynchronous

I am trying to serve long running requests using gunicorn and its async workers but I can't find any examples that I can get to work. I used the example here but tweaked to add a fake delay (sleep for 5s) before returning the response:
def app(environ, start_response):
data = "Hello, World!\n"
start_response("200 OK", [
("Content-Type", "text/plain"),
("Content-Length", str(len(data)))
])
time.sleep(5)
return iter([data])
Then I run gunicorn so:
gunicorn -w 4 myapp:app -k gevent
When I open up two browser tabs and type in http://127.0.0.1:8000/ in both of them and send the requests almost at the same time, the requests appear to get processed sequentially - one returns after 5 seconds and the other returns after a further 5 seconds.
Q. I am guessing the sleep isn't gevent friendly? But there are 4 workers and so even if the type of worker was 'sync' two workers should handle two requests simultaneously?

I just ran into the same thing, opened a question here: Requests not being distributed across gunicorn workers . The result is, it appears that the browser serializes access to the same page. I'm guessing perhaps this has something to do w/ cacheability, i.e. the browser thinks it's likely the page is cacheable, wait until it loads finds out it isn't so it makes another request and so on.

Give gevent.sleep a shot instead of time.sleep.
It's weird that this is happening with -w 4, but -k gevent is an async worker type, so it's possible gunicorn is feeding both requests to the same client. Assuming that's what's happening, time.sleep will lock your process unless you use gevent.monkey.patch_all().

When using gunicorn with non-blocking worker type, like gevent, It will use ONLY ONE process dealing with requests, so it's no surprise that your 5-second work carried out sequentially.
The async worker is useful when your workload is light, and request rate is rapid, in that case, gunicorn can utilize times wasted on waiting IO (like, waiting for socket to be writable to write the response to it), by switching to another worker to work another request. by switching to another request assigned to the same worker.
UPDATE
I was wrong.
When using gunicorn with non-blocking worker type, with worker settings in gunicorn, each worker is a process, that runs a separate queue.
So if the time.sleep was ran on different process, it will run simultaneously, but when it's ran in the same worker, it will be carried out sequentially.
The problem is that the gunicorn loadbalancer may not have distributed the two requests into two worker processes. You can check the current process by os.getpid().

Related

.NET ThreadPool tasks queued while pool not exhausted

Question
What can cause tasks to be queued in Thread Pool while there are plenty threads still available in pool?
Explanation
Our actual code is too big to post, but here is best approximation:
long running loop
{
create Task 1
{
HTTP Post request (async)
Wait
}
create Task 2
{
HTTP Post request (async)
Wait
}
Wait for Tasks 1 & 2
}
The issue is that these HTTP requests which usually take 110-120 ms sometimes take up to 800-1100 ms.
Before you ask:
Verified no delays on server side
Verified no delays on network layer (tcpdump + wireshark). If we have such delays, there are pauses between requests, TCP level turn-around fits in 100ms
Important info:
We run it on Linux.
This happens only when we run the service in container on k8s or docker.
If we move it outside container it works just fine.
How do we know it's not ThreadPool starvation?
We have added logging values returned by ThreadPool.GetAvailableThreads and we have values of 32k and 4k for available threads.
How do we know the tasks are queued?
we run dotnet-counters tool and we see queue sizes up to 5 in same second when issue occurs.
Side notes:
we control the network, we are 99.999% sure it not it (because you can never be sure...)
process is not CPU throttled
the process usually have 25 - 30 threads in total at given time
when running on k8s/docker we tried both container and host network - no change.
HttpClient notes:
We are using this HTTP client: https://learn.microsoft.com/en-us/dotnet/api/system.net.http.httpclient?view=net-6.0
Client instances are created before we launch the loop.
These are HTTP, not HTTPS requests
URLs are always the same per task, server is given as IP, like this http://1.2.3.4/targetfortaskX
Generally - using tcpdump and wireshark we observe two TCP streams to be opened and living through whole execution and all requests made are assigned to one of these two streams with keep-alive. So no delays on DNS, TCP SYN or source port exhaustion.

Nginx/uwsgi request buffer or queue

Our web servers run a Python app behind nginx + uwsgi.
Sometimes we have short spikes (2-5x avg no requests) for a second resulting in some requests getting a 502 if there are no workers available to handle them.
Is there a way for nginx or uwsgi to queue these requests up and serve them when workers become available?
It's better with a short increase in response time rather than getting an error ;-)

How nginx reload work ? why it is zero-downtime

refer to nginx official docs . the reload command of nginx is for reload of configuration files ，and during the progress , there's no downtime of the service .
i've learned that it wait requests that already connected until it finished ,and stop accept any new request . the idea is cool , but how does it deal with the keep-live connections ? because those long-live connections won't close and there continuous request comes along .

Here's the summary:
http://nginx.org/en/docs/control.html
The master process first checks the syntax validity, then tries to
apply new configuration. If this succeeds, it starts new worker
processes, and sends messages to old worker processes requesting them
to shut down gracefully.
That means it would keep older processes handling unclosed connections while having new processes working according to the updated configuration.
From this perspective connections with keep-alive are no different from other unclosed connections.
In versions prior to 1.11.11 such "old" processes could hang indefinitely long (according to #Alexey, haven't checked it though), from 1.11.11 there’s a configuration setting controlling this
http://nginx.org/en/docs/ngx_core_module.html#worker_shutdown_timeout

Does an ASP.NET HTTP Request Translate to 1 Thread?

Is it safe to assume that when a user requests an .aspx page via HTTP, that ASP.NET creates at least 1 thread for it?
If so, how long does it last?
If 1000 people make the HTTP request to the same .aspx page, is there some recycling of threads involved, so it doesn't spawn different 1000 threads?

Each request is allocated a thread from the iis page pool. the idea is that this should be a short running process so that thread can be returned to the page pool for use by another request coming (page pool sizes are not huge, usually, like 50). So, if you have a long running request, it's important you make an async call to free the thread for some other request. then, on your long running requests completion, you will get another thread from the pool and finish up.
Bottom line, if 1000 people make requests at the same time and none of them finish, 50 or so will run and the other 950 will wait.

How can Nginx be upgraded without dropping any requests?

According to the Nginx documentation:
If you need to replace nginx binary
with a new one (when upgrading to a
new version or adding/removing server
modules), you can do it without any
service downtime - no incoming
requests will be lost.
My coworker and I were trying to figure out: how does that work?. We know (we think) that:
Only one process can be listening on port 80 at a time
Nginx creates a socket and connects it to port 80
A parent process and any of its children can all bind to the same socket, which is how Nginx can have multiple worker children responding to requests
We also did some experiments with Nginx, like this:
Send a kill -USR2 to the current master process
Repeatedly run ps -ef | grep unicorn to see any unicorn processes, with their own pids and their parent pids
Observe that the new master process is, at first, a child of the old master process, but when the old master process is gone, the new master process has a ppid of 1.
So apparently the new master process can listen to the same socket as the old one while they're both running, because at that time, the new master is a child of the old master. But somehow the new master process can then become... um... nobody's child?
I assume this is standard Unix stuff, but my understanding of processes and ports and sockets is pretty darn fuzzy. Can anybody explain this in better detail? Are any of our assumptions wrong? And is there a book I can read to really grok this stuff?

For specifics: http://www.csc.villanova.edu/~mdamian/Sockets/TcpSockets.htm describes the C library for TCP sockets.
I think the key is that after a process forks while holding a socket file descriptor, the parent and child are both able to call accept() on it.
So here's the flow. Nginx, started normally:
Calls socket() and bind() and listen() to set up a socket, referenced by a file descriptor (integer).
Starts a thread that calls accept() on the file descriptor in a loop to handle incoming connections.
Then Nginx forks. The parent keeps running as usual, but the child immediately execs the new binary. exec() wipes out the old program, memory, and running threads, but inherits open file descriptors: see http://linux.die.net/man/2/execve. I suspect the exec() call passes the number of the open file descriptor as a command line parameter.
The child, started as part of an upgrade:
Reads the open file descriptor's number from the command line.
Starts a thread that calls accept() on the file descriptor in a loop to handle incoming connections.
Tells the parent to drain (stop accept()ing, and finish existing connections), and to die.

I have no idea how nginx does it, but basically, it could just exec the new binary, carrying the listening socket with it the new process (actually, it remains the same process, it just replaces the program executing in it). The listening socket has a backlog of incoming connections, and as long as it's fast enough to boot up, it should be able to start processing them before it overflows. If not, it could probably fork first, exec, and wait for it to boot up to the point where it's ready to process incoming requests, then hand over the command of the listening socket (file descriptors are inherited when forking, both have access to it) via some internal mechanism, before exiting. Noting your observations, this looks like what it's doing (if your parent process dies, your ppid is reassigned to init, i.e. pid 1)
If it has multiple processes competing to accept on the same listening socket (again, I have no idea how nginx does it, perhaps it has a dispatching process?), then you could replace them one by one, by ordering them to exec the new program, as above, but one at a time, as to never drop the ball. Note that during such a process there would never be any new pids or parent/child relationship changes.
At least, I think that's probably how I would do it, off the top of my head.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex