Anyone got tips for diagnosing SharePoint / ASP.Net "Request Timed Out" messages?
We've recently taken on the support and development of a client's MOSS public facing website. We've recreated a version of the site (a manual process - no Solution's here!) on 3 separate dev servers and are experiencing extremely slow warmup times. I'm used to waiting up to a minute after an IIS Reset but we are having to go through 2 Asp.Net "Request Timed Out" error messages. In general the site seems to be taking about 5 minutes to load up. Try doing custom development against that!
The strange thing is that on the staging and production servers the site takes about 40 seconds to warm up. They are slightly more powerful servers with a separate DB server but I wouldn't have thought the difference should be that great? I don't have any trouble with other SharePoint sites on my dev servers - just this one. It does contain a lot of custom code and DLLs so I understand that it may take a little longer to load these up but 5 minutes seems ridiculous.
The servers I'm testing this on are SharePoint 2007 (Feb CU), Win2003/IIS6, SQL 2005.
Does anyone have any tips for diagnosing the bottleneck here? I'm not sure if this is expected behaviour or a problem somewhere in the stack?
Cheers,
James.
Have you run any performance monitoring over the servers? This is essential for finding where the bottlenecks are. See here and here for recommendations.
If custom code has been deployed, check for an unusually high exception count or garbage collection/memory leak problems. This is most likely to be where the problem is. The best way to narrow this down is with a tool such as ANTS Profiler which will show memory leaks and performance issues. You could also Turn on ASP.NET tracing and set debug="true" in web.config and get some idea on slow executing code (although with all those timeouts this might not be so helpful).
Also do you know if any regular maintenance was performed on the SQL Server? (See some tips here.) Has SharePoint SP2 been installed (this performs some reindexing for you)?
Related
I try to load, edit and send back a text file trougth forms with the post method.
It works well, but when the file exceed a certain size (around 1200Ko), the programm crash beacause the request act as if it does not have any parameter.
What cause it to act like this? And how do I remove this limit?
Thanks for your help
visit http://www.mulesoft.com/tcat/tomcat-memory
You will read here how to avoid crashes in Tomcat 6 and fixing the related issues
Out Of Memory Errors, or OOMEs, are one of the most common problems faced by Apache Tomcat users. Generally, these errors occur during development, but can even occur on production servers that are experiencing an unusually high spike of traffic. Tomcat 7 includes fixes and workarounds to prevent some of the causes of OOMEs, but nothing substitutes a good understanding of why these errors occur.
This guide will help you understand why these errors are so prevalent and seemingly hard to fix, and show you how organizations using Apache Tomcat in enterprise production environments use Tcat to fix and avoid these errors.
OutOfMemoryError messages, or OOME's, are one of the most common problems that users experience with Tomcat. These messages can be caused by a wide variety of factors, and they severely affect application performance, so it's a good idea to do everything you can to prevent them before they occur. Here are the most common OOME-triggering situations, and steps to help you avoid them in the future. Tcat's management console gives you deep visibility into the memory stats of your Tomcat instances, allowing you to eliminate memory leaks and tune your servers more quickly than ever.
Some background info:
We have several websites running on a 64-bit machine with IIS6
These websites all have the same core code, but different skins and content
We have a SQL 2005 database which is fairly heavily used throughout the site
Historically we've used SQL stored procs, but have been gradually transitioning to NHibernate. The majority of our code uses NHibernate now, but not all.
These sites have been running fine on our live web server for a while, although we get a few errors a day regarding SQL connectivity / deadlocking.
Last Thursday we noticed the sites going very slow, then checking task manager revealed one of the websites was hogging over 1.6Gb of memory. Ever since then we've been restarting the app and watching it slowly increase in size over the course of the day.
We apparently have a memory leak (or at least, that's the effect), but I'm losing hair trying to work out how to trace it.
It only appears to be happening on this one website, even though as far as I am aware nothing had changed in the code before it started happenning. It is, however, our busiest website so it could be a traffic issue.
Debug Diagnostics hasn't revealed any issues.
Refreshing certain pages very quickly causes the memory to jump up rapidly, then fall slightly, but all the time the gradual progression is upwards.
I cannot replicate the issue on our test servers or locally. Probably because the traffic has something to do with it.
My suspicion is that the problem lies in database connectivity / locking. However, I'm not sure how that would cause the problem specified.
Any ideas?
Edit
Okay so not exactly sure I've found the problem but we're getting closer. It's definately SQL related. The error log reveals lots of errors since last thursday.
It all happened after we ran some windows updates on our servers. One of the updates failed on the SQL server so not sure if this caused some problems.
The warnings we're getting are:
SQL Server has encountered XX occurence(s) of I/O requests taking longer than 15 seconds to complete on file .. tempdb.mdf
Where XX is anything between 17 and 90! Does that sound like a deadlocking issue?
Followed by the following erors:
Unable to complete login process due to delay in opening server connection
These coincide with our log times for when the websites have been "blipping".
We've increased the page file size on SQL server to the recommended size, as it was set to a max of 4Gb, but recommended was 12Gb. I think we may need to roll back the windows updates we did on Thursday if that doesn't fix it.
Unfortunately I can't get into Activity monitor as it tells me Timeout expired!
Edit
Okay after a reboot I'm into Activity monitor. How many sleeping processes would you say would be normal? We have roughly 127 sleeping. That's serving over 10 websites.
If there is a deadlock or timeout issue, will NHibernate not clean up its connections properly?
Okay so in the end it seems it's quite complex. Sql deadlocks and data problems, heightened it seems by anti-virus software that was locking up or choking on a file.
Turning off the anti-virus reduced the problems, but we still need to resolve the underlying data issues.
One of our web servers is suffering from random w3wp.exe crashing and after a couple of weeks of debugging i simply cannot figure out why. The only thing that has helped so far is reducing the max worker processes from 15 to 5 however this isn't ideal as we are using a multi-cpu machine in the hopes of reducing the total number of servers needed. We serve a large volume of small requests so parallel processing is a requirement.
As far as I am aware all possible sources of parallel processing collision have been addressed using thread locking.
Win 2008 64Bit SP2
IIS7
Dual 3.1Ghz Xeon
4Gb Ram
First error:
Application: w3wp.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an internal error in the .NET Runtime at IP 70D9CECA (70D40000) with exit code 80131506.
Followed straight away by:
Faulting application w3wp.exe, version 7.0.6002.18005, time stamp 0x49e023cf, faulting module clr.dll, version 4.0.30319.1, time stamp 0x4ba1d9ef, exception code 0xc0000005, fault offset 0x0005ceca, process id 0x%9, application start time 0x%10.
Many Thanks
Edit
Problem eventually solved. It turned out SQL server was unmounting the database straight after every query, so every new query had to wait for it to re-mount. Anyway, telling SQL Server not to do that seems to have solved it, no idea how but it's working so I'm happy
Problem eventually solved. It turned out SQL server was unmounting the database straight after every query, so every new query had to wait for it to re-mount. Anyway, telling SQL Server not to do that seems to have solved it, no idea how but it's working so I'm happy
Exception code 0xc0000005 generally points to memory access violation. Look for any unsafe component you may be using.
You are in for a doozy of a ride. These exceptions are extremely tricky to track down and correct.
The first step is to get the IIS Debug Diagnostic Tool (v1.1). Once you have installed this, you'll need to set up some tracking projects and then attach the debugger to your running processes. Keep in mind, this tool collects a LOT of data (it can be in excess of 1GB of stuff), so combing through it may be a hassle, but it has a good potential of telling you what modules are causing the crash and what modules are interfering.
The reason w3wp.exe is crashing, though, is that an unhandle-able exception is occurring during phases of the transaction that your code/health-monitoring/etc are already completed.
In my own personal case, I found that decoupling the session from the process solved the problem. I never discovered the full reason, but the best guess we had was that the memory requirements for paging were too great for the w3wp.exe to handle all at the same time. Once we decoupled into an external session state server, the problem went away.
It may be time to re-think your web-garden. Scott Forsyth has an interesting 11 minute vLog on why webgardens are counterproductive: http://dotnetslackers.com/articles/iis/Why-You-Shouldnt-Use-Web-Gardens-in-IIS-Week-24.aspx
Links to articles he mentions in his VLog:
Tuning recommendations for IIS6 and IIS7 -- read the whole article: http://support.microsoft.com/kb/821268 Further information http://blogs.msdn.com/b/tmarq/archive/2007/07/21/asp-net-thread-usage-on-iis-7-0-and-6-0.aspx
His bottom line is if you have performance problems that are resolved by web gardens—use the web gardens as a great crutch until the underlying performance issue (usually resource contention) is resolved
I have a server running about 100+ WordPress sites of varying complexity and traffic volume. The OS is Windows 2003 Server running IIS 6 with the domains being managed via HELM. The thing is there are times when sites stop responding due to insufficient memory, but it has been difficult to track the particular site(s) or other culprit that could be the cause. What makes it even more complicated is that the problem will disappear for weeks and then show up again. The most recent solution was to migrate the sites to a higher capacity server and this seemed to have worked for some time.
What tools/techniques can I use to track down the problem while keeping in mind that this is a production server?
Tess Ferrandez has a number of great articles about tracking down memory pressure and process hangs in IIS using WinDbg and DebugDiag:
If it is broken, fix it you should
Whilst the techniques often focus on ASP.NET, many of the techniques can be applied to other languages. The only problem is that because PHP is written using native code your WinDbg-fu will probably need to be fairly good.
In continuation to last question - My site goes slow and stops access certain services externally if we check the Process monitor we see that it is normally due to the ‘w3p.exe’ process – which is the background process for running the website – it regularly reaches 99/100% - killing the process/restarting the WebPublishing service reolves tis – my webhost says this can only be due to bad coding ....can someone comment on this ??…
Wanted to know any monitoring software which traces IIS & freely available ...
If you are using Asp.Net then you can use the built in Asp.Net tracing to find out things like the size of your viewstate and the where the time is spent while rendering a page. There are various ways of enabling this depending on what your needs are: see http://authors.aspalliance.com/aspxtreme/webapps/tracefunctionality.aspx
99% CPU is not going to occur if you have an inefficient page or two. 99% CPU utilization happens when you have a bug.
If it does not happen on your local server, but only in the hosted environment, then you will have to resort to the old school detective approaches. Tracing, removing portions of code, and so on - til you find the source of the problem.