I have a multi-threaded web application with about 1000~2000 threads at production environment.
I expect CPU usage on w3wp.exe but System Idle Process eats CPU. Why?
The Idle process isn't actually a real process, it doesn't "eat" your CPU time. the %cpu you see next to it is actually unused %cpu (more or less).
The reason for the poor performance of your application is most likely due to your 2000 threads. Windows (or indeed any operating system) was never meant to run so many threads at a time. You're wasting most of the time just context switching between them, each getting a couple of milliseconds of processing time every ~30 seconds (15ms*2000=30sec!!!!).
Rethink your application.
the idle process is simply holding process time until a program needs it, its not actually eating any cycles at all. you can think the system idle time as 'available cpu'
System Idle Process is not a real process, it represents unused processor time.
This means that your app doesn't utilize the processor completely - it may be memory-bound or CPU-bound; possibly the threads are waiting for each other, or for external resources? Context switching overhead could also be a culprit - unless you have 2000 cores, the threads are not actually running all at the same time, but assigned time slices by the task scheduler, this also takes some time.
You have not provided a lot of details so I can only speculate at this point. I would say it is likely that most of those threads are doing nothing. The ones that are doing something are probably IO bound meaning that they are spending most of their waiting for the external resource to respond.
Now lets talk about the "1000~2000 threads". There are very few cases (maybe none) where having that many threads is a good idea. I think your current issue is a perfect example of why. Most of those threads are (apparently anyway) doing nothing but wasting resources. If you want to process multiple tasks in parallel, espcially if they are IO bound, then it is better to take advantage of pooled resources like the ThreadPool or by using the Task Parallel Library.
Related
I have an IIS application which behaves like this - Number of total threads in IIS processes is low, traffic starts at some low rate like 5 rpm, the number of threads starts increasing, alarmingly, keeps on going even after load stops, does not gets down in reasonable time, reaches like 30,000 plus threads, response time goes for a toss.
Machine config is set to auto_Config.
There are no explicit threads in application, though there is some --very fancy-- use of parallel for each.
Looking for some tips on how do I go about diagnosing this. Reducing parallel for each seemed to help; I am yet to conclusively prove it. Limiting max number of threads also helps cap the thread count; but I am thinking that there is something wrong with the app that causes those threads to keep increasing. I would want to solve this.
In the picture below, the thread count is ONLY for IIS worker processes. The PUT requests are the only ones doing some work; gets are mostly static resources requests.
Can this be reproduced in a local or dev environment? If so it's a good time to attach to the process and use the debugging tools to see what threads are managed and where they are in code. If that fails to unveil anything then it might be a time to capture a memory dump from the process and dig into it with windbg.
The idea behind Intels hyperthreading is (as far as I understand) that one core is used for two threads in a time-multiplexed manner.
The HW support this by having the state-related resources doubled and time-sharing other resources. If the running thread stalls (e.g. because it has to fetch new data from RAM), the other thread gets access to the shared resources. The result is a better utilization of the shared resources.
So if one thread isn't ready, the other thread is allowed to run. In other words - a thread switch can happen when the executing thread stalls.
I've tried to find out what will happen if both threads are ready for a long time but I haven't been able to find the information.
What happens if the running thread doesn't stall?
Will the running thread continue as long as it is ready?
Will the core switch to the other thread after some time? If so - what is the criteria for the switch? Is it controlled by HW or SW?
Hyperthreading is simultaneous multithreading (SMT). So it doesn't just switch back and forth on some relatively coarse-grain scale (like stalls), in the case of Sandy Bridge and newer, the fetcher and the decoder alternate between the threads. Execution units are shared competitively, so even if neither thread is stalling they can still together achieve a better utilization than if they ran alone (but that's not typical). So the problems you identified don't apply, because it doesn't work like that in the first place.
We have a process that is computationally intensive. When it runs it typically it uses 99% of the available CPU. It is configured to take advantage of all available processors so I believe this is OK. However, one of our customers is complaining because alarms go off on the server on which this process is running because of the high CPU utilization. I think that there is nothing wrong with high CPU utilization per se. The CPU drops back to normal when the process stops running and the process does run to completion (no infinite loops, etc.). I'm just wondering if I am on solid ground when I say that high CPU usage is not a problem per se.
Thank you,
Elliott
if I am on solid ground when I say that high CPU usage is not a problem per se
You are on solid ground.
We have a process that is computationally intensive
Then I'd expect high CPU usage.
The CPU drops back to normal when the process stops running
Sounds good so far.
Chances are that the systems you client are using are configured to notify when the CPU usage goes over a certain limit, as sometimes this is indicative of a problem (and sustained high usage can cause over heating and associated problems).
If this is expected behavior, your client needs to adjust their monitoring - but you need to ensure that the behavior is as expected on their systems and that it is not likely to cause problems (ensure that high CPU usage is not sustained).
Alarm is not a viable reason for poor design. The real reason may be that it chokes other tasks on the system. Modern OSes usually take care of this by lowering dynamic priority of the CPU hungry process in such a way that others that are less demanding of CPU time will get higher priority. You may tell the customer to "nice" the process to start with, since you probably don't care if it runs 10 mins of 12 mins. Just my 2 cents :)
I've been thinking today about NodeJS and it attitude towards blocking, it got me thinking, if a block of code is purely non-blocking, say calculating some real long alogirthm and variables are all present in the stack etc.. should this push a single core non hyperthreaded to CPU as Windows Task Manager defines it to 100% as it aims to complete this task as quickly as possible? Say that this is generally calculation that can take minutes.
Yes, it should. The algorithm should run as fast as it can. It's the operating system's job to schedule time to other processes if necessary.
If your non-blocking computation intensive code doesn't use 100% of the CPU then you are wasting cycles in the idle task. It always irritates me to see the idle task using 99% of the CPU.
As long as the CPU is "given" to other processes when there are some that need it to do their calculations, I suppose it's OK : why not use the CPU if it's available and there is some work to do ?
As RAM can be paged out to disk, all applications are potentially blocking. This would happen if the algorithm uses more RAM than available on the system. As a result, it won't hit 100%.
I have a theory regarding trouble shooting a Asynchronous Application (I'm using the CCR) and I wonder if someone can confirm my logic.
If a CCR based multi-threaded application using the default number of threads (i.e. one per core) is slower than the same application with double the threads specified - does this means that threads are being blocked somewhere in the code
What do think? Is this a quick and valid way to detect if threads are being inadvertantly being blocked?
What do you mean by "slower"?
If you want to automatically detect blocked threads, perhaps those threads should send a heartbeat, which are then observed by a monitor of some sort, but your options are limited.
A cheap way to tell if threads are being blocked is to get the current system time before doing any potentially blocking operation, then after the operation, and see how much time has elapsed. For example, while waiting for a message to arrive, measure to see how much time the thread was blocked waiting for a message to arrive.
Unless there are always more than enough messages to be processed, threads will block waiting for a message. If you have more threads, then you have more potential message generators (depending on your design) and thus threads waiting to receive messages will be more likely to have one ready.
Exactly one thread to CPU is too few unless you can guarantee that there will always be enough messages so no thread will have to wait.
If this is the case, that means that your threadpool is being exhausted (i.e. you have 2 threads but you've async pended 4 IOs or something) - if your work is heavily IO bound, the rule of "one thread per core" doesn't really apply.
I've found that to keep the system fluid with minimal threads, I keep the tasks dealing with I/O as concise as possible. They simply post the data from the I/O into another Port and do no further processing. The data is therefore queued elsewhere for processing in a controlled manner without interfering with the task of grabbing data as fast as possible. This processing might happen in the ExclusiveGroup of an Interleave if there's shared state to think about... and a handy side-effect is that exclusive tasks will never tie up all the threads in a Dispatcher (however, I suspect that there's probably nattier ways of managing this in the CCR API)