I have a server running about 100+ WordPress sites of varying complexity and traffic volume. The OS is Windows 2003 Server running IIS 6 with the domains being managed via HELM. The thing is there are times when sites stop responding due to insufficient memory, but it has been difficult to track the particular site(s) or other culprit that could be the cause. What makes it even more complicated is that the problem will disappear for weeks and then show up again. The most recent solution was to migrate the sites to a higher capacity server and this seemed to have worked for some time.
What tools/techniques can I use to track down the problem while keeping in mind that this is a production server?
Tess Ferrandez has a number of great articles about tracking down memory pressure and process hangs in IIS using WinDbg and DebugDiag:
If it is broken, fix it you should
Whilst the techniques often focus on ASP.NET, many of the techniques can be applied to other languages. The only problem is that because PHP is written using native code your WinDbg-fu will probably need to be fairly good.
Related
I try to load, edit and send back a text file trougth forms with the post method.
It works well, but when the file exceed a certain size (around 1200Ko), the programm crash beacause the request act as if it does not have any parameter.
What cause it to act like this? And how do I remove this limit?
Thanks for your help
visit http://www.mulesoft.com/tcat/tomcat-memory
You will read here how to avoid crashes in Tomcat 6 and fixing the related issues
Out Of Memory Errors, or OOMEs, are one of the most common problems faced by Apache Tomcat users. Generally, these errors occur during development, but can even occur on production servers that are experiencing an unusually high spike of traffic. Tomcat 7 includes fixes and workarounds to prevent some of the causes of OOMEs, but nothing substitutes a good understanding of why these errors occur.
This guide will help you understand why these errors are so prevalent and seemingly hard to fix, and show you how organizations using Apache Tomcat in enterprise production environments use Tcat to fix and avoid these errors.
OutOfMemoryError messages, or OOME's, are one of the most common problems that users experience with Tomcat. These messages can be caused by a wide variety of factors, and they severely affect application performance, so it's a good idea to do everything you can to prevent them before they occur. Here are the most common OOME-triggering situations, and steps to help you avoid them in the future. Tcat's management console gives you deep visibility into the memory stats of your Tomcat instances, allowing you to eliminate memory leaks and tune your servers more quickly than ever.
I have a Windows Server 2012 with IIS 8.0. It is hosting many small websites with a low user base which are not mission critical in any way. With small website I mean that the application code and memory footprint is quite low, but due to the loaded libraries, like EntityFramework, the memory consumption of the applications are about 140MB when freshly started and idle.
In general that’s not a big deal for a full-blown webserver, but I only have a VPS with 4GB of RAM which also runs several other applications (databases, BIND, hMail, etc.). I’m using it basically as development server to play with many different technologies. Therefore, I’m running out of RAM quickly while serving dozens of ~140MB w3wp’s.
Beside of suspending when idle I’d like to reduce the memory consumption while still using any framework or library I’d like to use – that’s the purpose of the whole thing actually.
Long story short: As the applications not only share the same .NET version but also some libraries like EF or MVC, would it make more sense to run multiple sites in one app_pool so that they can share the libs? Or would each site load its own copy anyway (due to different Application domains like discussed here)?
Bonus question: when considering a hardware upgrade 1GB of RAM is 20$/month but putting the whole server on SSDs is 10$/month. While I do know that reading from page file is always much slower than reading from RAM I’m thinking about using a big pagefile on the SSD instead of buying 1gig of additional RAM for twice the price – again, speed of the websites isn’t critical, they should just work. Would that make any sense at all?
Looking at a w3wp Process (hosting multiple sites) in Process Explorer shows that it hosts several different application domains with different instances of the same assemblies loaded into memory. So moving the sites into a single AppPool may not help much.
But there is another option. In IIS 8+ you can share common assemblies across AppPools. If certain assemblies are used by multiple AppPools, they are loaded into memory just once and then aliased by the different processes.
Have a look at this bit from asp.net and this TechNet blog post
You have to do a little bit of setup work, but then it seems to work quite well.
All of my websites are hosted in IIS and configured with one application pool. This application pool consists 10 websites running.
It is working fine till today, but all of sudden I am observing that there is sudden up and down % in CPU usage. I am unable to trace out the problem.
Is there anyway to check which website is taking much load among all in the application pool?
Performance counters, task manager and native code analysis tools only tell part of the story. To gain a deeper understanding of what is happening inside your ASP.NET application you need to use WinDBG, SOS and ADPlus.
Tess Ferrandez has a great series of articles on tracking down what is to blame here:
.NET Debugging Demos Lab 4: High CPU hang
.NET Debugging Demos Lab 4: High CPU Hang - Review
This is a real world example:
High CPU in .NET app using a static Generic.Dictionary
You will probably want to separate your sites into individual application pools so you can identify and isolate the site that is causing the high CPU (but it already looks like you have a suspect so I'd isolate that one). From then you can follow Tess's advice and guidance to track down the cause.
You should also take a look at the logs to see if you're experiencing an unexpected spike or increase in traffic. Perhaps there's a badly behaved search engine site indexer nailing the site. If that's the case then maybe you need to (if you haven't already done so) create a robots.txt to prevent crawlers from indexing parts of the site that don't need to be indexed. On top of that if certain crawlers are being overly promiscious then just ban them. Perhaps consider a sitemap for google to tame and tune its activities.
If your server has reached it's max capacity, you will see CPU go up and down erratically because the GC will start trying to recover resources(cache..etc), which in turn causes your sites to work even harder. It's an endless cycle.
Have you been monitoring your performance counters? Do you have any idea what normal capacity is for your site? If you cannot answer these questions, I suggest you gather some perf numbers as soon as possible.
My rule of thumb is to always measure first, then make necessary changes.
Most of the time performance bottlenecks aren't where you think they would be.
There is really no performance counter way to tell, because the CPU counters are at the process level. Your best bet would be to do a time corelation with other events in the event log and .NET/ASP.NET counters for garbage collection, requests etc.
If you really want to go hardcore, you could use the SysInternals toolset to take snapshots of your app pool over time and then do a post-analysis to figure out what code was executed when the spike happened. Here is a related example from Mark Russinovich's blog - http://blogs.technet.com/b/markrussinovich/archive/2008/04/07/3031251.aspx.
Some background info:
We have several websites running on a 64-bit machine with IIS6
These websites all have the same core code, but different skins and content
We have a SQL 2005 database which is fairly heavily used throughout the site
Historically we've used SQL stored procs, but have been gradually transitioning to NHibernate. The majority of our code uses NHibernate now, but not all.
These sites have been running fine on our live web server for a while, although we get a few errors a day regarding SQL connectivity / deadlocking.
Last Thursday we noticed the sites going very slow, then checking task manager revealed one of the websites was hogging over 1.6Gb of memory. Ever since then we've been restarting the app and watching it slowly increase in size over the course of the day.
We apparently have a memory leak (or at least, that's the effect), but I'm losing hair trying to work out how to trace it.
It only appears to be happening on this one website, even though as far as I am aware nothing had changed in the code before it started happenning. It is, however, our busiest website so it could be a traffic issue.
Debug Diagnostics hasn't revealed any issues.
Refreshing certain pages very quickly causes the memory to jump up rapidly, then fall slightly, but all the time the gradual progression is upwards.
I cannot replicate the issue on our test servers or locally. Probably because the traffic has something to do with it.
My suspicion is that the problem lies in database connectivity / locking. However, I'm not sure how that would cause the problem specified.
Any ideas?
Edit
Okay so not exactly sure I've found the problem but we're getting closer. It's definately SQL related. The error log reveals lots of errors since last thursday.
It all happened after we ran some windows updates on our servers. One of the updates failed on the SQL server so not sure if this caused some problems.
The warnings we're getting are:
SQL Server has encountered XX occurence(s) of I/O requests taking longer than 15 seconds to complete on file .. tempdb.mdf
Where XX is anything between 17 and 90! Does that sound like a deadlocking issue?
Followed by the following erors:
Unable to complete login process due to delay in opening server connection
These coincide with our log times for when the websites have been "blipping".
We've increased the page file size on SQL server to the recommended size, as it was set to a max of 4Gb, but recommended was 12Gb. I think we may need to roll back the windows updates we did on Thursday if that doesn't fix it.
Unfortunately I can't get into Activity monitor as it tells me Timeout expired!
Edit
Okay after a reboot I'm into Activity monitor. How many sleeping processes would you say would be normal? We have roughly 127 sleeping. That's serving over 10 websites.
If there is a deadlock or timeout issue, will NHibernate not clean up its connections properly?
Okay so in the end it seems it's quite complex. Sql deadlocks and data problems, heightened it seems by anti-virus software that was locking up or choking on a file.
Turning off the anti-virus reduced the problems, but we still need to resolve the underlying data issues.
Anyone got tips for diagnosing SharePoint / ASP.Net "Request Timed Out" messages?
We've recently taken on the support and development of a client's MOSS public facing website. We've recreated a version of the site (a manual process - no Solution's here!) on 3 separate dev servers and are experiencing extremely slow warmup times. I'm used to waiting up to a minute after an IIS Reset but we are having to go through 2 Asp.Net "Request Timed Out" error messages. In general the site seems to be taking about 5 minutes to load up. Try doing custom development against that!
The strange thing is that on the staging and production servers the site takes about 40 seconds to warm up. They are slightly more powerful servers with a separate DB server but I wouldn't have thought the difference should be that great? I don't have any trouble with other SharePoint sites on my dev servers - just this one. It does contain a lot of custom code and DLLs so I understand that it may take a little longer to load these up but 5 minutes seems ridiculous.
The servers I'm testing this on are SharePoint 2007 (Feb CU), Win2003/IIS6, SQL 2005.
Does anyone have any tips for diagnosing the bottleneck here? I'm not sure if this is expected behaviour or a problem somewhere in the stack?
Cheers,
James.
Have you run any performance monitoring over the servers? This is essential for finding where the bottlenecks are. See here and here for recommendations.
If custom code has been deployed, check for an unusually high exception count or garbage collection/memory leak problems. This is most likely to be where the problem is. The best way to narrow this down is with a tool such as ANTS Profiler which will show memory leaks and performance issues. You could also Turn on ASP.NET tracing and set debug="true" in web.config and get some idea on slow executing code (although with all those timeouts this might not be so helpful).
Also do you know if any regular maintenance was performed on the SQL Server? (See some tips here.) Has SharePoint SP2 been installed (this performs some reindexing for you)?