I have an IIS application which behaves like this - Number of total threads in IIS processes is low, traffic starts at some low rate like 5 rpm, the number of threads starts increasing, alarmingly, keeps on going even after load stops, does not gets down in reasonable time, reaches like 30,000 plus threads, response time goes for a toss.
Machine config is set to auto_Config.
There are no explicit threads in application, though there is some --very fancy-- use of parallel for each.
Looking for some tips on how do I go about diagnosing this. Reducing parallel for each seemed to help; I am yet to conclusively prove it. Limiting max number of threads also helps cap the thread count; but I am thinking that there is something wrong with the app that causes those threads to keep increasing. I would want to solve this.
In the picture below, the thread count is ONLY for IIS worker processes. The PUT requests are the only ones doing some work; gets are mostly static resources requests.
Can this be reproduced in a local or dev environment? If so it's a good time to attach to the process and use the debugging tools to see what threads are managed and where they are in code. If that fails to unveil anything then it might be a time to capture a memory dump from the process and dig into it with windbg.
Related
I Just want to understand how we should plan for the capacity of a NiFi instance.
We have a NiFi instance which is having around 500 flows. So, the total number of processors enabled on NiFi canvas is around 4000. We do run 2-5 flows simultaneously which does not take more than half an hour i.e. we do process data in MBs.
It was working fine till now but we are seeing outofMemory error very often. So we increased xms and xmx parameters from 4g to 8g which has resolved the problem for now. But going forward we will have more flows and we may face outofmemory issue again.
So, can anyone help with matrix of capacity planning or any suggestion to avoid such issues before happening? eg:- If we have 3000 processors enabled with/without any processing then Xg amount memory required.
Any input on NiFi capacity planning would be appreciated.
Thanks in Advance.
OOM errors can occur due to specific memory consuming processors. For example: SplitXML is loading your whole record to memory, so it could load a 1GiB file for instance.
Each processors can document what resource considerations should be taken. All of the Apache processors(as far as I can tell) are documented in that matter so you can rely on them.
In our example, by the way, SplitXML can be replaced with SplitRecord which doesn't load all of the record to memory.
So even if you use 1000 processors simultaneously, they might not consume as much memory as one processor that loads your whole FlowFile's content to memory.
Check which processors you are using and make sure you don't use one like that(there are more like this one that load the whole document to memory).
We are using ASP.NET but sometimes our applications using very much resources of CPU or RAM. I want to restrict resources for every virtual directory/web site and get alert when they reached the alert level.
I don't know I can do this for IIS but I wonder is this possible for other web servers like apache?
In Java, you cannot have the Java Virtual Machine enforce CPU time limits for certain threads: the best you can do is set the thread priority for any threads that you create. If there is no other work to be done, then a thread will burn as many CPU cycles as it can get unless is it blocking on something (I/O, synchronization monitor, etc.).
Using JMX, you can get some information about threads to possibly detect some runaway-thread situation, but you can't directly control the amount of CPU Time allowed for the thread.
If you are willing to re-architect your dynamic-content and other services, you could write them in such a way that they can support a "unit of work" that is somehow less than the full request's worth of work, and then you can have your request-processing thread execute a unit of work and then sleep an appropriate amount of time. You can lessen your CPU usage by doing this, but you will also certainly slow-down response times measurably.
Is there anything wrong with a fully-utilized CPU if your users are happy? Perhaps the real solution to your problem is more or bigger hardware.
I have inherited a somewhat complex system (and problem) that I need help with.
I have a webserver w/ the following specs:
Hardware:
Server 2003 32bit
IIS 6
8 cores (16 w/ hyperthreading)
12gb RAM
ASP.NET site
3 app pools, so 3 instances of w3wp.exe running.
This system serves a large number of people and bandwidth is fairly constant during business hours reaching ~ 68,000kbit/s
There are moments when the system "comes down" - site gets very slow which generates a lot of phone calls. Things usually slow down for 60 seconds, but has varied greatly in length. Sometimes only a few seconds and sometimes 3 minutes or more.
I have my app pools set to recycle somewhere about 600mb of consumed memory. That's not exact but they recycle on their own with much success. At times I recycle the "main" pool manually to clear the problem I'm describing.
This is what I know is going on when things are running slow.
Network bandwidth takes a considerable dip.
Requests Queued in the ASP.NET performance counters goes up.
In tandem w/ the Requests Queued rising page latency increases. (I employ a simple ASP page that makes a SQL call and just says "The system is live" - this page is monitored for latency)
Overall CPU usage rises.
Overall memory consumption of w3wp.exe rises.
In my mind here is what I imagine is happening.
Someone asks the system to generate a report or glob of data. This spins up a process that consumes a large number of threads (ie, all available threads) This causes all other requests to the system to wait in the ASP.NET que pool which essentially kills the site. The lack of activity causes the network traffic to dip.
I have read many articles about thread queues, thread pools, etc. This is a good example: http://williablog.net/williablog/post/2008/12/02/Increase-ASPNET-Scalability-Instantly.aspx and does what I believe is a clue to help me solve my problem... but I'm not sure. My "Machine.config" file for the version of asp.net that I am using does not specify any of the thread values listed in the article so we are default for everything which I believe is incorrect given our situation.
If you were me; What would you do next? Where do you think the problem is?
edit: Here is a screenshot. It should be obvious when the problem is happening.
http://i.imgur.com/5BJlq.png
edit:
I want to change these values for our setup. A few questions first:
1) After making the changes, what needs to be restarted for them to take effect?
2) How do these settings look for a system with 8 physical cores?
maxconnection = 96
maxIoThreads = 100
maxWorkerThreads = 100
minFreeThreads = 704
minLocalRequestFreeThreads = 608
Not fun.
Many root causes share common symptoms which makes it difficult to diagnose without getting dirty with the application. :) Pardon if some of these steps were implied.
Some next steps might be:
Review the IIS logs of each site looking for things like:
HTTP response codes (5xx,4xx,3xx)
Request response times
Review Windows Event Logs
How often are application pools cycling?
Application errors, etc.
Verify processModel settings as suggested by #vinayc to make sure predecessor didn't get 'tricky'
Install DebugDiag, its a surprisingly good tool for some basic analysis of memory and crash related problems.
This can also help you capture memory snaps to diagnose later.
Tess Ferrandez blog can help make heads/tails of memory snap analysis.
Understand how many web applications are running in each AppPool.
Investigate using a 'web garden' to possibly help minimize number of users impacted by 'slow down'
Is a virus scanner enabled? Is it running? If so, verify exclusions.
Are application teams available to help troubleshoot? Identify if they have any custom application instrumentation that might help diagnose problem.
Is the behavior 'new'? Or has it always been there? If 'new', can you track down which deployment might have caused the new behavior?
Could the the description given of the 'slow down' behavior be attributed to an apppool recycle and resulting jitting of the application again? ala - the first request syndrome.
Reviewing the logs helps understand how the sites/applications are being used, which can be especially important if you don't own the codebase. Logparser is an excellent tool for doing some IIS log analysis (as well as other formats).
Good luck!
Z
The settings that your are talking are part of processModel element under system.web element from machine.config. For IIS6, following are applicable:
autoConfig
maxIoThreads
maxWorkerThreads
minIoThreads
minWorkerThreads
requestQueueLimit
responseDeadlockInterval
Typically, you will only find autoConfig="true" and not other elements. Auto-config sets the values as per your machine configuration - the tuning is done as per recommended values (see Threading Explained section from this article) which are same as sighted by the link that you have provided.
The article although dated, i excellent resource if you want to tune up these settings manually.
On the other hand, at the load that you are serving, I would recommend two things (if you haven't tried already)
Use output caching aggressively - even if the data is dynamic, caching for say 30-60 seconds can give a definite boost at your load
If you suspect certain requests are hogging too many threads then attempt to move those resources under different app-pool (you can use different web-site with different sub-domain or you can use different virtual directory/application and choose different app-pool)
I have a multi-threaded web application with about 1000~2000 threads at production environment.
I expect CPU usage on w3wp.exe but System Idle Process eats CPU. Why?
The Idle process isn't actually a real process, it doesn't "eat" your CPU time. the %cpu you see next to it is actually unused %cpu (more or less).
The reason for the poor performance of your application is most likely due to your 2000 threads. Windows (or indeed any operating system) was never meant to run so many threads at a time. You're wasting most of the time just context switching between them, each getting a couple of milliseconds of processing time every ~30 seconds (15ms*2000=30sec!!!!).
Rethink your application.
the idle process is simply holding process time until a program needs it, its not actually eating any cycles at all. you can think the system idle time as 'available cpu'
System Idle Process is not a real process, it represents unused processor time.
This means that your app doesn't utilize the processor completely - it may be memory-bound or CPU-bound; possibly the threads are waiting for each other, or for external resources? Context switching overhead could also be a culprit - unless you have 2000 cores, the threads are not actually running all at the same time, but assigned time slices by the task scheduler, this also takes some time.
You have not provided a lot of details so I can only speculate at this point. I would say it is likely that most of those threads are doing nothing. The ones that are doing something are probably IO bound meaning that they are spending most of their waiting for the external resource to respond.
Now lets talk about the "1000~2000 threads". There are very few cases (maybe none) where having that many threads is a good idea. I think your current issue is a perfect example of why. Most of those threads are (apparently anyway) doing nothing but wasting resources. If you want to process multiple tasks in parallel, espcially if they are IO bound, then it is better to take advantage of pooled resources like the ThreadPool or by using the Task Parallel Library.
I have a small WCF service which is executed on an XP box with 256 megs of RAM running in VM.
When I make a request (with a request size of approximately 5mbs) to that service I always get the following message in the event log:
aspnet_wp.exe was recycled because memory consumption exceeded the 153 MB (60 percent of available RAM).
and the call fails with error 500.
I've tried to increase memory limit to 95% but it still takes up all the available memory and fails in the same manner.
It looks like something is wrong with my app (I do not reuse byte[] buffers and maybe something else) but I cannot find root cause of such memory overuse.
Profiling showed that all CLR objects that I have in memory together do not take up that much space.
Doing a dump analysis with windbg showed same situation - nothing that big in object heap.
How can I find out what is contributing to such memory overuse?
Is there any way to make a dump right before process is recycled (during peak mem usage)?
Tess Ferrandez's blog "If broken it is, fix it you should" has lots of hints, tips and recommendations for sorting out exactly this sort of problem.
Of particular use to you would be Lab 3: Memory, where she walks you through working out what has caused all the memory on your machine to disappear.
Could be a lot of things, hard to diagnose this one. Have you watched perfmon to see if the memory usage does peak on aspnet process or on the server itself? 256MB is pretty low, but it should still be able to handle it. Do you have a SWAP file on this machine? AT what point do you take the memory dump? Have you stepped though the code, and does it work on other machines? Perhaps it is getting stuck in a loop and leaking memory until it crashes?