we've seen that some time, under no heavy traffic, the php-fpm processes under nginx start to increase drastically.
We have 35 processes and them all of a sudden, you see CPU at 100% with 160 processes running at the same time. Last time it happened was a few seconds ago, the second last one was 2 weeks ago, pretty weird. We do not see memory problems or anything strange (too many accesses or so on).
Do you have any idea how we can avoid creating those processes? Or what could be the cause?
well fpm probably creates them to handle the traffic, and if the process has 100% then that might be some part of ur code eating up the processor.
If you want to force fpm not to create more than a certain number then check the file under /etc/php5/fpm/pool.d/www.conf, there you'll find max children and stuff like that
Related
I Just want to understand how we should plan for the capacity of a NiFi instance.
We have a NiFi instance which is having around 500 flows. So, the total number of processors enabled on NiFi canvas is around 4000. We do run 2-5 flows simultaneously which does not take more than half an hour i.e. we do process data in MBs.
It was working fine till now but we are seeing outofMemory error very often. So we increased xms and xmx parameters from 4g to 8g which has resolved the problem for now. But going forward we will have more flows and we may face outofmemory issue again.
So, can anyone help with matrix of capacity planning or any suggestion to avoid such issues before happening? eg:- If we have 3000 processors enabled with/without any processing then Xg amount memory required.
Any input on NiFi capacity planning would be appreciated.
Thanks in Advance.
OOM errors can occur due to specific memory consuming processors. For example: SplitXML is loading your whole record to memory, so it could load a 1GiB file for instance.
Each processors can document what resource considerations should be taken. All of the Apache processors(as far as I can tell) are documented in that matter so you can rely on them.
In our example, by the way, SplitXML can be replaced with SplitRecord which doesn't load all of the record to memory.
So even if you use 1000 processors simultaneously, they might not consume as much memory as one processor that loads your whole FlowFile's content to memory.
Check which processors you are using and make sure you don't use one like that(there are more like this one that load the whole document to memory).
We are using Gunicorn with Nginx. After every time we restart gunicorn, the CPU usage took by Gunicorn keeps on increasing gradually. This increases from 0.5% to around 85% in a matter of 3-4 days. On restarting gunicorn, it comes down to 0.5%.
Please suggest what can cause this issue and how to go forward to debug and fix this.
Check workers configuration. Try use the following: cores * 2 -1
Check your application, seems that your application is blocking / freezing threads. Add timeout to all api calls, database queries, etc.
You can add an APM software to analyze your application, for example datadog.
I have an IIS application which behaves like this - Number of total threads in IIS processes is low, traffic starts at some low rate like 5 rpm, the number of threads starts increasing, alarmingly, keeps on going even after load stops, does not gets down in reasonable time, reaches like 30,000 plus threads, response time goes for a toss.
Machine config is set to auto_Config.
There are no explicit threads in application, though there is some --very fancy-- use of parallel for each.
Looking for some tips on how do I go about diagnosing this. Reducing parallel for each seemed to help; I am yet to conclusively prove it. Limiting max number of threads also helps cap the thread count; but I am thinking that there is something wrong with the app that causes those threads to keep increasing. I would want to solve this.
In the picture below, the thread count is ONLY for IIS worker processes. The PUT requests are the only ones doing some work; gets are mostly static resources requests.
Can this be reproduced in a local or dev environment? If so it's a good time to attach to the process and use the debugging tools to see what threads are managed and where they are in code. If that fails to unveil anything then it might be a time to capture a memory dump from the process and dig into it with windbg.
I have an existing asp.net app that worked for months without issues.
The problem
Suddenly, (after a new code release) once every day or two, the CPU starts to go from 100% utilization to 0%, and back and forth, every few seconds.
While this is happening, aspnet requests are being queued up, execution time and wait time increases drastically.
Restart of WWW Publishing Service “solves” the issue (for a day or so). Pre-emptive restart of the service also helps.
My guess at the cause
Since this started after a new release, I blame the new code but I am looking for clues what it could be. My best guess would be memory leak but memory usage of w3wp.exe never goes over 6.5GB and there is spare physical memory, and I do store a lot of stuff in the session.
Can anyone offer a clue?
Debugging IIS is a daunting task and one I have little experience with so I am hoping someone else had a similar issue and can provide a clue.
Some more notes/clues
When restarting WWW service at the time of this issue, the stopping of the service takes a long time. A good two minutes.
w3wp.exe is part of IIS, but doesn't actually do much work. The 100% CPU utilization is from the code that is running inside of that process. That's your code.
If you have free memory, then any memory leak doesn't matter. Ignore memory leaks for now.
Can you reproduce this problem on a Development machine? If so, then you need to profile the application while it's running so you can find out where the application is spending its time.
I have a multi-threaded web application with about 1000~2000 threads at production environment.
I expect CPU usage on w3wp.exe but System Idle Process eats CPU. Why?
The Idle process isn't actually a real process, it doesn't "eat" your CPU time. the %cpu you see next to it is actually unused %cpu (more or less).
The reason for the poor performance of your application is most likely due to your 2000 threads. Windows (or indeed any operating system) was never meant to run so many threads at a time. You're wasting most of the time just context switching between them, each getting a couple of milliseconds of processing time every ~30 seconds (15ms*2000=30sec!!!!).
Rethink your application.
the idle process is simply holding process time until a program needs it, its not actually eating any cycles at all. you can think the system idle time as 'available cpu'
System Idle Process is not a real process, it represents unused processor time.
This means that your app doesn't utilize the processor completely - it may be memory-bound or CPU-bound; possibly the threads are waiting for each other, or for external resources? Context switching overhead could also be a culprit - unless you have 2000 cores, the threads are not actually running all at the same time, but assigned time slices by the task scheduler, this also takes some time.
You have not provided a lot of details so I can only speculate at this point. I would say it is likely that most of those threads are doing nothing. The ones that are doing something are probably IO bound meaning that they are spending most of their waiting for the external resource to respond.
Now lets talk about the "1000~2000 threads". There are very few cases (maybe none) where having that many threads is a good idea. I think your current issue is a perfect example of why. Most of those threads are (apparently anyway) doing nothing but wasting resources. If you want to process multiple tasks in parallel, espcially if they are IO bound, then it is better to take advantage of pooled resources like the ThreadPool or by using the Task Parallel Library.