I have following configuration with
worker_process 4;
But I noticed that it always hits only 1 worker.
I am testing on a local Centos VM. I am doing curl http call on specific port and added a file with 1000 curl requests and ran them from multiple terminal windows.
But see alll of them hit only 1 worker. Is there a way that I can have atleast more than 1 worker started. Can someone please share their knowledge on this.
https://blog.cloudflare.com/the-sad-state-of-linux-socket-balancing/
In the epoll-and-accept the load balancing algorithm differs: Linux seems to choose the last added process, a LIFO-like behavior. The process added to the waiting queue most recently will get the new connection. This behavior causes the busiest process, the one that only just went back to event loop, to receive the majority of the new connections. Therefore, the busiest worker is likely to get most of the load.
Related
I have a jenkins master which has 3 docker slaves and 2 VM slaves. I have installed Jenkins as a service on RedHat linux. It is observed that, the CPU utilization goes upto 100% sometimes and thereby I have to reboot the box. When i check the processes, I can see that there is a main master jenkins process and several other child jenkins processes(which are an exact replica of the master process) are hung and are causing the spike(Confirmed this through new relic).
I am trying to reproduce this issue, however have been unsuccessful in the same.
Below are my queries:
I know the previous process id, can I get some logs or dumps related to it post the server restart?
Is there a better approach to trouble shoot this, so that I can narrow down on the issue?
At this point I am unable to understand where are these child processes getting spawned from and how can I find the culprit.
I'm executing a load test against an application hosted in Azure. It's a cloud service with 3 instances behind an internal load balancer (Hash based load balancing mode).
When I execute the load test, it queues request even though the req/sec and total current request to IIS is quite low. I'm not sure what could be the problem.
Any suggestions?
Adding few screenshot of performance counters which might help you take decision.
Click on image to view original image.
Edit-1: Per request from Rohit Rajan,
Cloud Service is having 2 instances (meaning 2 VMs), each of them having 14 GBs of RAM and 8 cores.
I'm executing a Step load pattern start with 100 and add 100,150 user every 5 minutes, till 4-5 hours until the load reaches to 10,000 VUs.
Any call to external system are written async. Database calls are synchronous.
There is no straight forward answer to your question. One possible way would be to explore additional investigation options.
Based on your explanation, there seems to be a bottleneck within the application which is causing the requests to queue-up.
In order to investigate this, collect a memory dump when you see the requests queuing up and then use DebugDiag to run a hang analysis on it.
There are several ways to gather the memory dump.
Task Manager
Procdump.exe
Debug Diagnostics
Process Explorer
Once you have the memory dump you can install debug diag and then run analysis on it. It will generate a report which can help you get started.
Debug Diagnostics download: https://www.microsoft.com/en-us/download/details.aspx?id=49924
I have a very simple php socket server that runs on my machine. I created a convinience class with simple methods like "restart" and "stop" and etc. to control the server once it is already running.
What the restart function does is it sends the server the command to stop and then it forks the process and starts a new instance of the socket server within the child process while the parent process returns to the caller.
This works great on the command line, however I am trying to make an admin webpage which restarts the socket server and the forking is causing problems in php-fpm. Basically, what it appears is happening is that the life of the "page loading" is not ending when the parent process ends and nginx/php-fpm are not reassigning the process to new connections until the forked process also ends.
In my case, I do not want the forked process to ever end, and I end up with a completely debilitated server. (in my test environment, for simplicity i have the worker pool set to only 1, in a production environment this will be higher, but this issue would lead to one worker slot being permanently occupied).
I have tried a few things including calling fastcgi_finish_request() just prior to the forking, however this had no affect.
How can I restart my service from a php-fpm worker process without locking up an assignment in the nginx connection pool?
The solution was simple and elementary.
My child processes were not redirecting STDOUT and STDERR to /dev/null so therefore even though the parent process finished, the fact that the child process still had active file descriptors was causing php-fpm to consider that connection in its pool as still active, and therefore it would never be re-assigned to new connections on the basis that the child process would run continually.
Redirecting STDERR and STDOUT to /dev/null caused php-fpm to correctly reassign connections while simultaneously allowing the child process (the socket server) to run forever. Case closed.
./some/command >/dev/null 2>&1
Should have seen this one a mile off...
(I solved this problem months ago, but haven't signed into this account in a long time ... it didn't take me 7 months to figure out I need to direct output to /dev/null!)
Sounds like you have your protocol design wrong. The server should be capabele of restarting itself. There's quite a few examples of how to do that in the internet.
The client (your webpage) should not need to do any forking.
The server should also not run inside php-fpm, but be a console application that uses a daemon(3) type interface to detach itself from the console.
I have a webserver (nginx) running debian and php5-fpm randomly seems to crach, it replys with 504 bad gateway if i call php files.
when it is in a crashed state and i do sudo /etc/init.d/php5-fpm it says that it is running, but it will still it gives 504 bad gateway until i do sudo /etc/init.d/php5-fpm
I'm thinking that it has maybe to do with one of my php files which is in a infinity loop until a certain event occurs (change in mysql database) or until it will be time-outed. I don't know if generally that is a good thing or if i should make the loop quit itself before a timeout occurs.
Thanks in advice!
First look at the nginx error.log for the actual error. I don't think PHP crashed, just your loop is using all available php-fpm processes, so there is none free to serve your next request from nginx. That should produce Timeout error in the logs (nginx will wait for some time for available php-fpm process).
Regarding your second question. You should not use infinite loops for this. And if you do, insert sleep() command inside the loop - otherwise you will overload your CPU with that loop and also database with queries.
Also I guess it is enough to have one PHP process in that loop waiting for a event. In that case use some type of semaphore (file or info in db) to let other processes know that one is already waiting for that event. Otherwise you will always eat up all available PHP processes.
I'll try to be very specific on this - it won't be easy, so please try to follow.
We have a script that runs with PHP on NGINX - PHP-fpm FastCGI.
This script gets information from the user trying to access it, and runs some algorithm on real-time. It cannot be a scheduled process running in the background.
Sometimes, it even takes for the page between 5-12 seconds to load - and it's ok.
Generally, we collect data from the user and make several outgoing request to third-party servers, collect the data, analyse it and return a response for the user .
The problem is,
There are many users running this script, and the server gets very busy - since they're all active connection on the server, waiting for a response.
We have 2 servers running under 1 load balancer, and that's not enough.
Sometimes the servers have more the 1,500 active connections at a time. You can imagine how these servers respond at that timeframe.
I'm looking for a solution.
We can add more and more servers to the LB, but it just sounds absurd that it's the only solution there is.
We ran over that script and optimized it to the maximum, I can promise you that -
There is no real solution for the long-time running of that script, since it depends on 3rd party servers that take time to respond to us on live traffic.
Is there a solution you can think of, to keep this script as it is -
but somehow to lower the impact of these active connection on the overall servers' functioning?
Sometimes, they just simply stop to respond.
Thank you very much for reading!
3 months old question, I know, but I cant help it thinking that:
In case you're sure that the sum of the network work for all requests to the third-party servers plus the corresponding processing of the responses inside your PHP script is much lower than the limits of your hardware.
Your PHP script is then likely inefficiently busy-looping until all responses come back from the third-party servers
If I were dealing with such an issue I'd do:
Stop using your custom external C++ curl thing, as the PHP script is busy-waiting for it anyways.
Google and read up on non-busy-looping usage of PHP's curl-multi implementation
Hope this makes sense.
My advice is to set limited timeouts for requests and to use asynchronous requests for each third-party request.
For example, for your page you have to display results of 5 third-party requests. It means, that inside script you call cURL or file_get_contents 5 times, but script becomes frozen for each timeout from third party. Step by step. It means, that if for each request you have to wait 10 seconds for the response - you'll have 50 seconds in total.
User calls the script -> script wait to end -> server is loaded for 50 seconds
Now, if each request to third party will be sent asynchronously - it will reduce script's load time to the maximum request delay. So, you'll have few smaller scripts, that will live shorter life - and it will decrease load on the server.
User calls the script -> script is loaded -> requests are sent -> there are no scripts that are waiting for the response and consuming resources of your server
May the AJAX be with you! ;)
This is a very old question, but since I had a similar problem I can share my solution. Long running scripts impact various parts of the system and cause stresses in webservers (in active connects), php-fpm and mysql/other databases. These tend to cause a number of knock on effects such as other requests starting to fail.
Firstly make sure you have netdata (https://github.com/netdata/netdata) installed on the server. If you are running many instances you might find having a Grafana/Prometheus setup is worth it too.
Next make sure it can see the PHP FPM process, Mysql and Nginx. There are many many things Netdata shows, but for this problem, my key metrics were:
Connections (mysql_local.connections) - is the database full of connections
PHP-FPM Active Connections (phpfpm_local.connections) - is PHP failing to keep up
PHP-FPM Request Duration (phpfpm_local.request_duration) - Is the time to process going thru the roof?
Disk Utilization Time (disk_util.sda) - this shows if the disk cannot keep up (100% = bad under load)
Users Open Files (users.files)
Make sure that you have sufficient file handles (https://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/), and the disk is not fully occupied. Both of these will block you from making stuff work, so make them big on the troublesome server.
Next check Nginx has enough resources in nginx.conf:
worker_processes auto;
worker_rlimit_nofile 30000;
events {
worker_connections 768;
}
This will give you time to work out what is going wrong.
Next look at php-fpm (/etc/php/7.2/fpm/pool.d/www.conf):
set pm.max_spare_servers high such as 100
set pm.max_requests = 500 - just in case you have a script that doesn't free itself properly.
Then watch. The problem for me was every request blocks an incoming connection. More requests to the same script will block more connections. The machine can be operating fine, but a single slow script doing a curl hit or a slow SQL statement will take that connection for its entire duration, so 30 seconds = 1 less php process to handle incoming requests. Eventually you hit 500, and you run out. If you can increase the number of FPM processes to match the frequency of slow script requests to the number seconds they run for. So if the script takes 2 seconds, and gets hit 2 times a second, you will need a constant 4 additional fpm worker threads chewed up doing nothing.
If you can do that, stop there - the extra effort beyond that is probably not worth it. If it still feels like it will be an issue - create a second php-fpm instance on the box and send all requests for the slow script to that new instance. This allows you to fail those requests discretely in the case of too much run time. This will give you the power to do two important things:
Control the amount of resources devoted to the slow script
Mean that all other scripts are never blocked by the slow script and (assuming the OS limits are high enough) never affected by resource limits.
Hope that helps someone struggling under load!