Service Unavailable - IIS - http

My problem is that sometimes the CPU usage on the webserver is going to 100% (caused by the W3wp.exe)
At that moment the website will become "service unavailable"
Question: Where can I check from the IIS/HTTPERR logs where the website became "service unavailable"?
Can I used Log Parser to identify at which time this is happening? If yes is there any query?
Thank You

You could create a user dump file for the process and use the debug diagnostics tool to analyze what happened. The tool is part of the IIS Diagnostics Toolkit (download and description here). It is located in the folder C:\Program Files\IIS Resources\DebugDiag.
This support article explains in detail how to do that:
How to use the Debug Diagnostics Tool to troubleshoot high CPU usage by a process in IIS

Dunno if this is any food for thought, but this is what we do:
When our page rendering goes over a certain acceptable threshold we mark the server as "busy" and all future new sessions are denied "Server busy" - that lets people with open sessions finish up, lighten the load, and free up resources for creation ofnew sessions to resume
We do this by recording average task duration each minute, and checking if the average over the last five minutes exceeds the threshold - then set the Busy flag. The flag will be cleared on the next recalculation (which is a task scheduled for one-minute interval) when the 5-minute moving-average falls below threshold again.

Related

Win 10 Task Scheduler keeps disabling tasks

I use Task Scheduler to run a .bat file that zips and encrypts files and does a nightly backup of them to an external drive. I have used this for years and it has worked just fine. Starting a couple of months ago, the task is getting disabled in Task Scheduler and I cannot figure out why. I can enable it, and it will run again, but then gets disabled again the same day or in the next day or two. Then I noticed that it is not just this task, but other scheduled tasks (that I did not write) that are also getting disabled, all at the same time. Things like Google software update are getting disabled too.
I have been looking at the task history to see when tasks are disabled, and there is no common denominator that I can think of.
When I restart, all tasks are enabled. But what causes them to be disabled?????
I had this for months before I narrowed it down; with the help of posts from the likes of Paul above. Every task kept getting disabled, sometimes several times a day. In my case it was AVG anti-virus.
In AVG settings there is a tools menu, in there is a series of check boxes called "Do Not Disturb Mode". It seems AVG is trying to 'help' the user by not disturbing him if he is running a program full screen by turning off notifications, but it's over zealous and turns off all tasks too.
I unchecked all those tick boxes and didn't have the problem for two months. Then as a test I turned them on again and the tasks got disabled in just an hour or so.
I have had several AVG updates over the period and it's still present, so I don't think it's a bug as much as a feature.
I had the same problem that started in the last few weeks.
If you are running Avast anti-virus, there is bug [feature] that disables scheduled tasks and apparently never re-enables them. See this link: https://forum.avast.com/index.php?topic=249063.0
I had this problem last year. I believe if my memory serves me right that it happens when the Win 10 clock doesn't synchronise and the W32time service has stopped for some reason. If you go into task manager and select services, look for W32time and see if it is running. If it isn't right click and restart it
I had this problem last year. I believe if my memory serves me right that it happens when the Win 10 clock doesn't synchronise and the W32time service has stopped for some reason. If you go into task manager and select services, look for W32time and see if it is running. If it isn't right click and restart it
"Maintains date and time synchronization on all clients and servers in the network. If this service is stopped, date and time synchronization will be unavailable. If this service is disabled, any services that explicitly depend on it will fail to start."
Also check on Windows services for Windows Time to see if it is in manual or Automatic and change it to auto start (W32time)
Wow, thanks for this post. Would never have found out why my scheduled tasks are being disabled. I thought i was being hacked ! Ironically it is Avast's "Do not disturb feature". I had an app to auto-restart, and when this app restarted at the same clock time, all of my scheduled tasks would be disabled. Avast automatically decides to set multiple apps to "Do not disturb = True".
More details: https://forum.avast.com/index.php?topic=249063.0

ASP.NE WEBSITE takes forever to respond

My asp.net web application is encountering down time everyday, it takes forever to respond. But once I stop and start (not iis reset) the website in IIS it will work again. Then hours/a day later it will become unresponsive again. What would be the reason? I'm suspecting an unclosed connection to database but hard to find them. The codes were made by the previous programmer.
Check the queue length which is a setting under apppool.
If its happening during a particular time of the day then please check the resource utilization like CPU/RAM consumed during that particular time.
There are APM tools like App Insight available which you can use to monitor the request response time for the requests.
You can implement Google analytics to see number of users online or requesting to see if its threshold issue.
Look into IIS logs during the time of issue and check the time-taken field. If its above normal, proceed to the following step
During the time of issue (before you restart the website), capture a manual hang dump of the w3wp process - https://blogs.msdn.microsoft.com/debugdiag/2013/03/15/debug-diagnostic-1-2-generate-a-manual-hang-dump-on-a-specific-process/
Run Debug Diag report and share it if you can. It'll tell you things that are possible going wrong.

BizTalk MessageBox > 60GB

I have a BizTalk 2010 Server up and running for a few months.
Recently we noticed that the MessageBox grows and throttling kicked in.
In production, the MessageBox is > 60 GB and the trackingdata view returns 1.5m records.
In our acceptance environment, the trackingdata view returns about 300k records.
We missed to create a dedicated tracking host in the first place, but managed to create one in acceptance last week. The dedicated tracking host has not changed anything in our acceptance environment, therefore I have not yet created one in production.
All jobs are enabled and run continuously without an error.
In acceptance, I do not have any running/suspended messages.
I also can't find any exceptions in the event log.
I'm looking forward to any hint to improve the setup and reduce the messagebox size.
Thanks & best regards
Michael
Have you checked to see the sql server jobs that purge records from the database are running?
Look at the sql server agent job named 'MessageBox_Message_Cleanup_BizTalkMsgBoxDb' (there are others there as well). They normally run periodically to clean things up. You might run the job manually and see if there are errors and check the job activity monitor.

iis startup delay with aspx pages

Environment: Windows Server 2003; IIS 6, ASP.NET 2.0.50727
I'm going crazy with a brand new web server that we set up (note that this problem doesn't happen on our other web servers which have the same configuration). When loading and asp.net app the first time, the page hangs for over a full minute before showing the page in the browser. After it loads the first page, everything runs very quickly.
Note 1: You will probably say that the application is being compiled for the first time. But I've ruled that out. I put trace messages EVERYWHERE in the app and all the trace messages run within a second of requesting the page. Thus, the app compiles and runs immediately. But when the app is finished rendering the page and my last trace message is printed, nothing happens. IIS is doing something behind the scenes for a full minute before transferring the finished page along http to the user's browser.
Note 2: We found that after hitting the app the first time and things run fine, if we wait an hour then we get the delay again. Thus, IIS has something in its cache that it clears out after an hour and causes our site to stall again.
Note 3: Between each test we stop/start IIS to force it to hang upon loading the app.
Note 4: We watched the Task Manager to see if IIS was spiking and taking up a lot of resources processing something. But that wasn't it. We did see a very quick spike to 50% immediately before the browser showed the page, but for the previous 60 seconds there was only 1% usage on the server.
Note 5: On another test I created a HelloWorld.html page and this does not cause IIS to hang. Thus, it has something to do with calling the ASP.NET library the very first time it sends a rendered page across http. Also, since the app has already been compiled and runs instantly, it's just the part of asp.net that sends the rendered page to the user's browser that causes the delay.
Any ideas? We are a a loss here. All of our other web servers are setup the same way and work fine, but this is a new install. So there must be a configuration setting that was missed or maybe something needs to be installed?
Thanks,
Brian
If you have access to the servers, then make sure that app pool recycling is actually logged to the event logs
cscript adsutil.vbs get w3svc/AppPools/DefaultAppPool/LogEventOnRecycle
you can set it to log everything with
cscript adsutil.vbs Set w3svc/AppPools/DefaultAppPool/LogEventOnRecycle 255
See more here
Then check if there were any recycles.
App initialization, creation the worker process, threads, load the app domain and all the references dll's can take some time, that's normal, but that 1 minute delay is something else probably.
Try to precompile the app on the server and see if that helps
aspnet_compiler -m /LM/W3SVC/[site id ]/Root/[your appname]
If you want to dig deeper, you can check the event trace ETW.
logman query providers
Save the IIS /ASP.NET related Guids to a file like iisproviders.txt
logman start ExampleTrace -pf iisproviders.txt -ets -rt
reproduce
LogParser "SELECT * FROM ExampleTrace" -i:ETW
logman stop ExampleTrace -ets
You can find more hereTroubleshooting appdomain restarts and other issues with ETW tracing
I would also check the w3wp.exe with procexp if it has a TCP connection time out or with Procmon for other clues.
If you have experience with windbg, then you can make a request to the app then quickly attach the debugger to the process
windbg -p [process id of the app pool]
.loadby sos mscorwks
g
and take it from there. If there are exceptions, process crash, etc you should be able to catch it...
Once we had a weird server issue like this and a .NET reinstall solved the problem, still not sure what was the culprit.
Could be some aspnet.config settings on this box that are different from others. Have you tried copying over their config files to this server? There appears to be certificate options along with registry modifications that you can do to remove some lag time during the initial load of a page (precompiling aside)
See here and here
One thing you might want to check on is if there are any database access going on on your page load. That might be blocking the creation of the page during initial page load. Then when the query is cached (either by the db engine or another cache mechanism like memcached), subsequent page loads work as normal.
As per your last comment,
I could stop/start IIS multiple times and the app always ran instantly. I thought it was fixed for good. But now I just tried again (it has been sitting idle for the past couple of hours) and now it is back to hanging on the first request.
This could mean that the cache has expired and thus needs to hit the database once again, causing the delay in page load.

What is consuming over 65% of time in an ASP.NET application?

I have an .Net Framework 4.0, ASP.NET, ASP.NET MVC 3 web application hosted on a Windows 7 / IIS 7.5. IIS logging is enabled on this machine and set to log in W3C mode.
The application is compiled by using the Release configuration and has been deployed to IIS with <compilation debug='false' attribute set explicitly. The Web.config specifies the use of SQL Server based session state.
I have added the following statements in Global.asax in BeginRequest and EndRequest events respectively. The results i.e. "sw.Elapsed.TotalMilliseconds" are getting stored in an Application level list of values. I dump these values out via a debug page and get an average of the same.
// in BeginRequest
HttpContext.Current.Items.Add("RequestStartEnd", System.Diagnostics.Stopwatch.StartNew());
// in EndRequest
var sw = (System.Diagnostics.Stopwatch)HttpContext.Current.Items["RequestStartEnd"];
sw.Stop();
I have created a load test which runs a single request against this application with a concurrent user load of 20 users. The test is run in Visual Studio 2010 Ultimate edition.
After running the load test, I am getting an average time-taken as recorded by the stopwatch as 681 milliseconds.The average time-taken as per IIS for these requests (I cleaned out all logs before running the load test) is 2121 milliseconds. The average time-taken as per IIS tallys with the value shown in Visual Studio load test report.
The stopwatch time-taken only accounts for 32% of time-taken as reported by IIS logs / Visual Studio. Where does the other 68% time go?
Update 1:
I set the session state to InProc and re-ran the load test. In this scenario the difference between the average time reported by stopwatch and average time-taken reported by IIS logs grew to more than 70%!!! Where is all that time going?
Update 2:
#Peter - I tried out the failed request tracing by putting a trace rule to log on status code of 200. Next, I ran the load test with 20 concurrent users for approx 1.5 minutes. Went through last 50 trace files and found that the "Time Taken" field in that report had range of 750ms to 1300ms. The Visual Studio report showed avg. time taken as 2300ms. In the report, using the compact view, I see that the time taken changes between the following transitions
(1) AspNetStart -> AspNetAppDomainEnter
(2) ManagedPipelineHandler-start ManagedPipelineHandler-end.
The (2) item is probably my application's code. Still there is a big difference between max of time-taken as per failed request logs i.e. 1300ms and the avg. time-taken as shown by Visual Studio 2300ms. How to find accounting for that? Thanks for this great tip though!
How long are you running your load test for? If you're running it under windows 7 you may get different results than under a windows server as well. You could be having issues with getting threads allocated to the thread pool under burst loads. .Net will immediately allocate threads up to the thread pool min and then slowly allocate up to the thread pool max. I believe the default settings for client OS's are different than for server OS's. It may be on a client OS the default min thread setting is equal to the number of cores on your machine, meaning that .net will then slowly allocate more threads to meet your burst load.
A simple check would be to let your load test run longer and see if the gap between your 2 measurements narrows.
There is a better way to look into the internals of your app, by using "Failed Request Tracing Rules"
http://learn.iis.net/page.aspx/266/troubleshooting-failed-requests-using-tracing-in-iis-7/
With that you can follow exactly what you app is doing in IIS
I would suggest looking into MvcMiniProfiler. It's a NuGet package you could add and wire in (almost effortlessly) that really breaks down the execution times of various points in your MVC app. You could see a live view of each request and where any bottlenecks reside.
More info:
http://code.google.com/p/mvc-mini-profiler/
http://www.hanselman.com/blog/NuGetPackageOfTheWeek9ASPNETMiniProfilerFromStackExchangeRocksYourWorld.aspx
http://www.nuget.org/List/Packages/MiniProfiler
Its most likely a combination of the Managed Pipeline overhead and your network (even localhost or 127.0.0.1) is probably where the "lost" time can be accounted for.
You mention that the IIS logs tally with your Visual Studio load
test figures - both of those involve the network stack and the managed pipleline.
Your stopwatch code only executes inside the ASP.NET context (just
before ASP.NET execution begins and just before it ends), and does not take into account
the IIS overhead of processing a TCP network request, parsing
HTTP-headers for both the request and response and transmission time
through the managed pipeline and TCP stack.
[EDIT] The proportion is exaggerated even more if your ASP.NET page execution time is really fast, e.g. it does not consume a lot of CPU relative to the rest of the TCP stack and managed pipeline that marshals the HTTP request for you
if you are having issues with long load times it may be related to several things if you have a high number of IO operations it can affect this
if you are doing database queries with an orm try profiling your sql statements either in vs with a sql profiler or in sql servers built in profiler if you are getting alot of individual querys you may want to consider using some includes in you linq queries to bundle up some of your data into single queries
you may also want to consider want to consider using async controllers to improve response times if you have alot of IO operations so your application isnt continuously waiting for each io operation to complete
see wintellect.com/CS/blogs/jprosise/archive/2010/03/29/…
there also better solutions out there for logging than the iis logging as it is a fairly old component you may want to take a look at log4net for more general logging or elmah for logging errors and exceptions as these solutions can bundle up a bunch of log entries and write several in a single operation also improving IO performance

Resources