we have requirement to handle 10000 concurrent user.
Let me explain the system. Machine has two processors. ProcessModel in machine.config is set as autoconfig = true. so that makes maxWorkerThreads = 20.
When I Load run my case with 30 users and watch CPU usage it is maximing to 100. and number of threads on w3wp.exe is more then 70. As my default is 20 * 2 (CPU's) = 40.
Once cpu touches 100% most of the transaction fail or talking maximum time to respond
Now questions
1. how do i get 30 more threads assigned to the same workerprocess?
2. How can reduce CPU usage here?
You have a bit of an issue here. Increasing the # of threads will further increase CPU usage. (Your 2 goals are incompatible.) Not only are you asking each CPU to do more, but you'll have additional overhead with context switching.
You could investigate using a non-blocking IO library, which would essentially mean only 1-2 threads per CPU. However, this could be a significant architecture change to your project (probably not feasible) - and what you might actually find is that most of the CPU was actually spent due to the work your code is performing, and not because of anything threading-related.
It sounds like you need to do some performance tuning and optimization of your application.
You should take a look at making async calls so that your threads are not remaining active while the caller is waiting for a response.
http://msdn.microsoft.com/en-us/magazine/cc163463.aspx
Related
We have migrated our thread per connection communication model to an async IO based TCP server using boost::asio. The reason for this change is that the old model didn't scale well enough. We have permanently about 2k persistent connections on average with the tendency to keep growing on a monthly basis.
My question is what would be the ideal number of worker threads that will poll the io_service queue for completion handlers - the number of virtual cpu cores?
Choosing a small number can lead to the situations where the server does not consume quickly enough and can't cope with the rate the clients send messages.
Does it make sense to add worker threads dynamically in such situations?
Update:
Probably it is my implementation but i find this statement part of the boost asio docu confusing:
Implementation strategies such as thread-per-connection (which a synchronous-only approach would require) can degrade system
performance, due to increased context switching, synchronisation and
data movement among CPUs. With asynchronous operations it is possible
to avoid the cost of context switching by minimising the number of
operating system threads — typically a limited resource — and only
activating the logical threads of control that have events to
process.
As if you have X threads pumping completion events on a machine that has X cores - 1) you don't have any guarantees that each thread gets a dedicated cpu and 2) if my connections are persistent i don't have any guarantees that the thread that say does an async_read will be the same as the one to execute the completion handler.
void Connection::read {
boost::asio::async_read(socket, boost::asio::buffer(buffer, 18),
boost::bind(&Connection::handleRead, shared_from_this(),
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred));
}
void Connection::handleRead(const boost::system::error_code &error,
std::size_t bytes_transferred) {
// processing of the bytes
...
// keep processing on this socket
read();
}
In an ideal situation with perfectly non-blocking I/O, a working set that fits entirely in L1 cache, and no other processes in the physical system, each thread will use the entire resources of a processor core. In such a situation, the ideal number of threads is one per logical core.
If any part of your I/O is blocking then it makes sense to add more threads than the number of cores so that no cores sit idle. If half the thread time is spent blocked then you should have almost 2 threads per core. If 75% of the thread time is spent blocked then you should have 3 or 4 per core, and so on. Context switching overhead counts as blocking for this purpose.
I've noticed that when Microsoft has to make a blind guess about this, they tend to go for two or four threads per core. Depending on your budget for making this determination I would either go with 2 or 4, or start with one thread per core and work my way up, measuring throughput (requests serviced / second) and latency (min, max, and average response time) until I hit the sweet spot.
Dynamically adjusting this value only makes sense if you are dealing with radically different programs. For a predictable workload there is a sweet spot for your hardware that should not change much even with increasing amounts of work to be done. If you are making a general purpose web server then dynamic adjustment is probably necessary.
I have an IIS application which behaves like this - Number of total threads in IIS processes is low, traffic starts at some low rate like 5 rpm, the number of threads starts increasing, alarmingly, keeps on going even after load stops, does not gets down in reasonable time, reaches like 30,000 plus threads, response time goes for a toss.
Machine config is set to auto_Config.
There are no explicit threads in application, though there is some --very fancy-- use of parallel for each.
Looking for some tips on how do I go about diagnosing this. Reducing parallel for each seemed to help; I am yet to conclusively prove it. Limiting max number of threads also helps cap the thread count; but I am thinking that there is something wrong with the app that causes those threads to keep increasing. I would want to solve this.
In the picture below, the thread count is ONLY for IIS worker processes. The PUT requests are the only ones doing some work; gets are mostly static resources requests.
Can this be reproduced in a local or dev environment? If so it's a good time to attach to the process and use the debugging tools to see what threads are managed and where they are in code. If that fails to unveil anything then it might be a time to capture a memory dump from the process and dig into it with windbg.
I am load testing an .Net 4.0 MVC application hosted on IIS 7.5 (default config, in particular processModel autoconfig=true), and am observing odd behavior in how .Net manages the threads.
http://msdn.microsoft.com/en-us/library/0ka9477y(v=vs.100).aspx mentions that "When a minimum is reached, the thread pool can create additional threads or wait until some tasks complete".
It seems the duration that threads are blocked for, plays a role in whether it creates new threads or waits for tasks to complete. Not necessarily resulting in optimal throughput.
Question: Is there any way to control that behavior, so threads are generated as needed and request queuing is minimized?
Observation:
I ran two tests, on a test controller action, that does not do much beside Thread.Sleep for an arbirtrary time.
50 requests/second with the page sleeping 1 second
5 requests/second with the page sleeping for 10 seconds
For both cases .Net would ideally use 50 threads to keep up with incoming traffic. What I observe is that in the first case it does not do that, instead it chugs along executing some 20 odd requests concurrently, letting the incoming requests queue up. In the second case threads seem to be added as needed.
Both tests generated traffic for 100 seconds. Here are corresponding perfmon screenshots.
In both cases the Requests Queued counter is highlighted (note the 0.01 scaling)
50/sec Test
For most of the test 22 requests are handled concurrently (turquoise line). As each takes about a second, that means almost 30 requests/sec queue up, until the test stops generating load after 100 seconds, and the queue gets slowly worked off. Briefly the number of concurrency jumps to just above 40 but never to 50, the minimum needed to keep up with the traffic at hand.
It is almost as if the threadpool management algorithm determines that it doesn't make sense to create new threads, because it has a history of ~22 tasks completing (i.e. threads becoming available) per second. Completely ignoring the fact that it has a queue of some 2800 requests waiting to be handled.
5/sec Test
Conversely in the 5/sec test threads are added at a steady rate (red line). The server falls behind initially, and requests do queue up, but no more than 52, and eventually enough threads are added for the queue to be worked off with more than 70 requests executing concurrently, even while load is still being generated.
Of course the workload is higher in the 50/sec test, as 10x the number of http requests is being handled, but the server has no problem at all handling that traffic, once the threadpool is primed with enough threads (e.g. by running the 5/sec test).
It just seems to not be able to deal with a sudden burst of traffic, because it decides not to add any more threads to deal with the load (it would rather throw 503 errors than add more threads in this scenario, it seems). I find this hard to believe, as a 50 requests/second traffic burst is surely something IIS is supposed to be able to handle on a 16 core machine. Is there some setting that would nudge the threadpool towards erring slightly more on the side of creating new threads, rather than waiting for tasks to complete?
Looks like it's a known issue:
"Microsoft recommends that you tune the minimum number of threads only when there is load on the Web server for only short periods (0 to 10 minutes). In these cases, the ThreadPool does not have enough time to reach the optimal level of threads to handle the load."
Exactly describes the situation at hand.
Solution: Slightly increased the minWorkerThreads in machine.config to handle expected traffic burst. (4 would give us 64 threads on the 16 core machine).
We have a process that is computationally intensive. When it runs it typically it uses 99% of the available CPU. It is configured to take advantage of all available processors so I believe this is OK. However, one of our customers is complaining because alarms go off on the server on which this process is running because of the high CPU utilization. I think that there is nothing wrong with high CPU utilization per se. The CPU drops back to normal when the process stops running and the process does run to completion (no infinite loops, etc.). I'm just wondering if I am on solid ground when I say that high CPU usage is not a problem per se.
Thank you,
Elliott
if I am on solid ground when I say that high CPU usage is not a problem per se
You are on solid ground.
We have a process that is computationally intensive
Then I'd expect high CPU usage.
The CPU drops back to normal when the process stops running
Sounds good so far.
Chances are that the systems you client are using are configured to notify when the CPU usage goes over a certain limit, as sometimes this is indicative of a problem (and sustained high usage can cause over heating and associated problems).
If this is expected behavior, your client needs to adjust their monitoring - but you need to ensure that the behavior is as expected on their systems and that it is not likely to cause problems (ensure that high CPU usage is not sustained).
Alarm is not a viable reason for poor design. The real reason may be that it chokes other tasks on the system. Modern OSes usually take care of this by lowering dynamic priority of the CPU hungry process in such a way that others that are less demanding of CPU time will get higher priority. You may tell the customer to "nice" the process to start with, since you probably don't care if it runs 10 mins of 12 mins. Just my 2 cents :)
I have a multi-threaded web application with about 1000~2000 threads at production environment.
I expect CPU usage on w3wp.exe but System Idle Process eats CPU. Why?
The Idle process isn't actually a real process, it doesn't "eat" your CPU time. the %cpu you see next to it is actually unused %cpu (more or less).
The reason for the poor performance of your application is most likely due to your 2000 threads. Windows (or indeed any operating system) was never meant to run so many threads at a time. You're wasting most of the time just context switching between them, each getting a couple of milliseconds of processing time every ~30 seconds (15ms*2000=30sec!!!!).
Rethink your application.
the idle process is simply holding process time until a program needs it, its not actually eating any cycles at all. you can think the system idle time as 'available cpu'
System Idle Process is not a real process, it represents unused processor time.
This means that your app doesn't utilize the processor completely - it may be memory-bound or CPU-bound; possibly the threads are waiting for each other, or for external resources? Context switching overhead could also be a culprit - unless you have 2000 cores, the threads are not actually running all at the same time, but assigned time slices by the task scheduler, this also takes some time.
You have not provided a lot of details so I can only speculate at this point. I would say it is likely that most of those threads are doing nothing. The ones that are doing something are probably IO bound meaning that they are spending most of their waiting for the external resource to respond.
Now lets talk about the "1000~2000 threads". There are very few cases (maybe none) where having that many threads is a good idea. I think your current issue is a perfect example of why. Most of those threads are (apparently anyway) doing nothing but wasting resources. If you want to process multiple tasks in parallel, espcially if they are IO bound, then it is better to take advantage of pooled resources like the ThreadPool or by using the Task Parallel Library.