Make uWSGI use all workers

Make uWSGI use all workers - nginx

My application is very heavy (it downloads some data from internet and puts it into a zip file), and sometimes it takes even more than a minute to respond (please, note, this is a proof of concept). CPU has 2 cores and internet bandwidth is at 10% utilization during a request. I launch uWSGI like this:
uwsgi --processes=2 --http=:8001 --wsgi-file=app.py
When I start two requests, they queue up. How do I make them get handled simultaneously instead? Tried adding --lazy, --master and --enable-threads in all combinations, neither helped. Creating two separate instanced does work, but that doesn't seem like a good practice.

are you sure you are not trying to make two connections from the same browser (it is generally blocked) ? try with curl or wget

Related

How to send 50.000 HTTP requests in a few seconds?

I want to create a load test for a feature of my app. It’s using a Google App Engine and a VM. The user sends HTTP requests to the App Engine. It’s realistic that this Engine gets thousands of requests in a few seconds. So I want to create a load test, where I send 20.000 - 50.000 in a timeframe of 1-10 seconds.
How would you solve this problem?
I started to try using Google Cloud Task, because it seems perfect for this. You schedule HTTP requests for a specific timepoint. The docs say that there is a limit of 500 tasks per second per queue. If you need more tasks per second, you can split this tasks into multiple queues. I did this, but Google Cloud Tasks does not execute all the scheduled task at the given timepoint. One queue needs 2-5 minutes to execute 500 requests, which are all scheduled for the same second :thinking_face:
I also tried a TypeScript script running asynchronous node-fetch requests, but I need for 5.000 requests 77 seconds on my macbook.

I don't think you can get 50.000 HTTP requests "in a few seconds" from "your macbook", it's better to consider going for a special load testing tool (which can be deployed onto GCP virtual machine in order to minimize network latency and traffic costs)
The tool choice is up to you, either you need to have powerful enough machine type so it would be able to conduct 50k requests "in a few seconds" from a single virtual machine or the tool needs to have the feature of running in clustered mode so you could kick off several machines and they would send the requests together at the same moment of time.
Given you mention TypeScript you might want to try out k6 tool (it doesn't scale though) or check out Open Source Load Testing Tools: Which One Should You Use? to see what are other options, none of them provides JavaScript API however several don't require programming languages knowledge at all

A tool you could consider using is siege.
This is Linux based and to prevent any additional cost by testing from an outside system out of GCP.
You could deploy siege on a relatively large machine or a few machines inside GCP.
It is fairly simple to set up, but since you mention that you need 20-50k in a span of a few seconds, siege by default only allows 255 requests per second. You can make this larger, though, so it can fit your needs.
You would need to play around on how many connections a machine can establish, since each machine will have a certain limit based on CPU, Memory and number of network sockets. You could just increase the -c number, until the machine gives an "Error: system resources exhausted" error or something similar. Experiment with what your virtual machine on GCP can handle.

How to make a UNIX socket faster?

I'm running a Google Cloud Compute VM as my application server for an app that's available on iOS and Android. The server runs Django within uWSGI, fronted with nginx. The communication between uWSGI and nginx happens through a unix file socket.
Recently I started noticing timeouts at client end. I did a bit of experimentation, and found that uWSGI sometimes errors out while writing data to the file socket. When I increase the 'max-time' parameter at the client end, it goes through smoothly. For example, a sample request that returns about 200KB of json data, takes about 1 sec for Django to compute. But the UNIX socket seems to take another 1-2 secs, which seems too high for a 200KB response. If the client is expecting a response within 2 secs, this often leads to a write error (as shown in the screenshot below) at uWSGI. When I increase the timeout at the client end, it goes through smoothly.
I want to know if there are some configuration changes that can make reading and writing on a UNIX socket faster. 200KB is a very minor size for a JSON response from my server - so I won't be able to bring it down. And I can't have a timeout of more than 2 secs at my client (iOS or Android), for business reasons.

Several unix entities are represented by files but are no file at all. Pipes and sockets are examples of entities represented by files that are not files.
So, writing, and reading from a unix socket is not bound to file system I/O and does not share file system time responses. In fact, unix socket is one of fastest ways of IPC, being more efficient than a TCP socket, since it does not use network I/O at all.
That stated, here is some hints on how to solve your particular problem:
Evaluate your app for performance issues. Profile it and check where it might be spending too much time. Usually, I/O is the main villain on performance issues. Also, bad algorithms, linear searches on long lists are also common guilties.
Check your configuration on both web server and your application gateway.
Check processes scheduling. If everybody is running on the same box, process concurrency may be an issue for heavy loads. Be sure to have all processes running under proper priorities.
Good luck!

Capped performance over UDP (and sometimes TCP) in iPerf

I can't seem to figure out what's wrong with my iPerf executable. I am trying to automate the execution of iPerf using a Telnet script (this is the one I am using https://github.com/ngharo/Random-PHP-Classes/blob/master/Telnet.class.php). I'd like to know what I can do to find the reason for the bottleneck, assuming the PHP script works as expected. Basically, if I run it manually over the command line, I get the rates desired however if I run it remotely using the script I get capped performance.
What I have tried is using tcpdump to output the logs while iperf is running and then reading it using Wireshark. All I can observe is that the time differences between the fragments are larger when using the script, which means the rates will be lower. I don't know what to do next after this. Any ideas what else I can look at/try? I've tried changing kernel values for buffer sizes using sysctl but this has no effect as running it manually always works anyway.
Note that I have tried to play around with all the iperf configuration options such as -w, -l, -b (I havent tried burst mode). No success.

how to send response directly from worker to client

When Nginx is used as a reverse proxy so that the client connects to Nginx and Nginx load balances or otherwise redirects the request to a backend worker via CGI etc... what is it called and how is it implemented when the worker responds directly to the client bypassing Nginx?
The source of my question is from two places. a) erlangonxen uses Nginx and a "spawner" app to launch a huge volume of instant-on workers. However, the response still passes through the spawner (an expensive step); b) I recently scanned an article that described this solution but I can no longer find it.

You got your jargon mixed I believe, so I'm going to ignore the proxy bit and assume this is about CGI. In that case you should be looking for fast CGI solutions. Nginx has support for fast CGI built in.
This spawner as you call it, is meant to provide concurrency, so that multiple CGI requests can be handled in parallel, without having to spawn an interpreter for each request. Instead the workers get spawned and ideally live forever.
If the selection of an available worker really is a performance bottleneck, then the implementation of this fast CGI daemon is severely lacking and you should look for a better solution. Worker selection should be a fraction of the time of the workers job.

I'm not sure if it's a jargon thing. The good news (for me anyway) is that I had read the articles and seen the diagrams... I just could not remember where. So reverse proxy not withstanding... I was looking for a "direct server request" (DSR) and the spawner from the erlangonxen project.
I'm not certain whether ot not these two technologies are going to work together. The DSR seems to have fallen out of favor and I'll probably not use it al all although in the given architecture it would seem to make sense to try. a) limits the total number of trips and sockets; b) really allows for some functions like gzip to be distributed nicely
Anyway, "found it".

Long running-time script in PHP causes NGINX server to get very busy

I'll try to be very specific on this - it won't be easy, so please try to follow.
We have a script that runs with PHP on NGINX - PHP-fpm FastCGI.
This script gets information from the user trying to access it, and runs some algorithm on real-time. It cannot be a scheduled process running in the background.
Sometimes, it even takes for the page between 5-12 seconds to load - and it's ok.
Generally, we collect data from the user and make several outgoing request to third-party servers, collect the data, analyse it and return a response for the user .
The problem is,
There are many users running this script, and the server gets very busy - since they're all active connection on the server, waiting for a response.
We have 2 servers running under 1 load balancer, and that's not enough.
Sometimes the servers have more the 1,500 active connections at a time. You can imagine how these servers respond at that timeframe.
I'm looking for a solution.
We can add more and more servers to the LB, but it just sounds absurd that it's the only solution there is.
We ran over that script and optimized it to the maximum, I can promise you that -
There is no real solution for the long-time running of that script, since it depends on 3rd party servers that take time to respond to us on live traffic.
Is there a solution you can think of, to keep this script as it is -
but somehow to lower the impact of these active connection on the overall servers' functioning?
Sometimes, they just simply stop to respond.
Thank you very much for reading!

3 months old question, I know, but I cant help it thinking that:
In case you're sure that the sum of the network work for all requests to the third-party servers plus the corresponding processing of the responses inside your PHP script is much lower than the limits of your hardware.
Your PHP script is then likely inefficiently busy-looping until all responses come back from the third-party servers
If I were dealing with such an issue I'd do:
Stop using your custom external C++ curl thing, as the PHP script is busy-waiting for it anyways.
Google and read up on non-busy-looping usage of PHP's curl-multi implementation
Hope this makes sense.

My advice is to set limited timeouts for requests and to use asynchronous requests for each third-party request.
For example, for your page you have to display results of 5 third-party requests. It means, that inside script you call cURL or file_get_contents 5 times, but script becomes frozen for each timeout from third party. Step by step. It means, that if for each request you have to wait 10 seconds for the response - you'll have 50 seconds in total.
User calls the script -> script wait to end -> server is loaded for 50 seconds
Now, if each request to third party will be sent asynchronously - it will reduce script's load time to the maximum request delay. So, you'll have few smaller scripts, that will live shorter life - and it will decrease load on the server.
User calls the script -> script is loaded -> requests are sent -> there are no scripts that are waiting for the response and consuming resources of your server
May the AJAX be with you! ;)

This is a very old question, but since I had a similar problem I can share my solution. Long running scripts impact various parts of the system and cause stresses in webservers (in active connects), php-fpm and mysql/other databases. These tend to cause a number of knock on effects such as other requests starting to fail.
Firstly make sure you have netdata (https://github.com/netdata/netdata) installed on the server. If you are running many instances you might find having a Grafana/Prometheus setup is worth it too.
Next make sure it can see the PHP FPM process, Mysql and Nginx. There are many many things Netdata shows, but for this problem, my key metrics were:
Connections (mysql_local.connections) - is the database full of connections
PHP-FPM Active Connections (phpfpm_local.connections) - is PHP failing to keep up
PHP-FPM Request Duration (phpfpm_local.request_duration) - Is the time to process going thru the roof?
Disk Utilization Time (disk_util.sda) - this shows if the disk cannot keep up (100% = bad under load)
Users Open Files (users.files)
Make sure that you have sufficient file handles (https://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/), and the disk is not fully occupied. Both of these will block you from making stuff work, so make them big on the troublesome server.
Next check Nginx has enough resources in nginx.conf:
worker_processes auto;
worker_rlimit_nofile 30000;
events {
worker_connections 768;
}
This will give you time to work out what is going wrong.
Next look at php-fpm (/etc/php/7.2/fpm/pool.d/www.conf):
set pm.max_spare_servers high such as 100
set pm.max_requests = 500 - just in case you have a script that doesn't free itself properly.
Then watch. The problem for me was every request blocks an incoming connection. More requests to the same script will block more connections. The machine can be operating fine, but a single slow script doing a curl hit or a slow SQL statement will take that connection for its entire duration, so 30 seconds = 1 less php process to handle incoming requests. Eventually you hit 500, and you run out. If you can increase the number of FPM processes to match the frequency of slow script requests to the number seconds they run for. So if the script takes 2 seconds, and gets hit 2 times a second, you will need a constant 4 additional fpm worker threads chewed up doing nothing.
If you can do that, stop there - the extra effort beyond that is probably not worth it. If it still feels like it will be an issue - create a second php-fpm instance on the box and send all requests for the slow script to that new instance. This allows you to fail those requests discretely in the case of too much run time. This will give you the power to do two important things:
Control the amount of resources devoted to the slow script
Mean that all other scripts are never blocked by the slow script and (assuming the OS limits are high enough) never affected by resource limits.
Hope that helps someone struggling under load!

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex