SSRS Report remote connection timeout on IIS7 - iis-7

I have a set of SSRS05 reports being called through the 2005 web service from our site that runs on Win2k8 and IIS7. The reports are stored remotely on another server from the web server.
In our QA environment, if we have a report that runs for longer than 2 minutes, I get the following error:
Message : The underlying connection was closed: A connection that was expected to be kept alive was closed by the server.
Status : KeepAliveFailure
The problem is that the SQL server shows that the stored procedure is executed correctly, the report server shows that the report was rendered correctly (after ~15 minutes), but the web server receives that error consistently after 2 minutes (120 seconds).
This same report runs with no problems on our old environment with ran IIS6. The issue I'm running into is that it looks like long running reports seem to run fine in our production environment. The QA environment as a replicate of our production environment, just with less juice.
Any ideas? I've checked the Keep-Alive status and made sure it's enabled. I've tried changing the connection settings timeout from 120 (default) to 300 seconds and it still fails at 120. I feel that we've been able to isolate to IIS7, but I'm at a lost where to look to change the timeout setting.

Any chance you're running in SharePoint Integrated Mode? If so, see "An ASP.NET Timeout Error Occurs After Two Minutes of Report Processing" in Troubleshooting Configuration Errors.

Related

ASP.NET Application Pool process often gets stuck

We have a webserver (WinSrv2019) running a few ASP.NET 4.8 websites with a huge traffic.
Each site is running in its personal 64-bit Application Pool (Integrated Pipeline), .NET CLR Version v4.0.
One of the websites gets stuck apparently and stops responding.
Sometimes it continues a few minutes, sometimes it can remain stucked for a hour.
During this period our inter-application logs are empty.
Moreover, IIS log that logs incoming requests is empty as well and incoming requests during this period aren't being logged.
On the client side browser doesn't get any error, but just continue waiting for response.
Since it happens with the single site only, we've excluded the option that the problem refers server hardware. The server resources usage aren't get over 50%
On the second hand we don't have idea what can cause Application Pool to get stuck in this way.
Please advice.
How we can investigate this issue?
Is there any tool allowing to "debug" what happens inside IIS on the low level?
Thanks in advance

Oracle doesn't release or kill inactive process and/or session from my ASP.NET Application pool

We Just ran into a problem with our website. We are running a Oracle 12c database and using IIS 8 on a Windows 12 Azure server. ODP.NET is the provider we use. The website is build in C# and run as a 64bit web application.
For Whatever reason, Oracle doesn't release our connections, and after a while we got an Oracle Error :
ORA-12520: TNS:listener could not find available handler for requested type of server
We reviewed all of our connections string, adding :
Min Pool Size=10;Max Pool Size=100;Connection Lifetime=120;Connection Timeout=60;Incr Pool Size=5; Decr Pool Size=2
We more than double the MAX Process and Sessions that was defined in the Oracle configuration.
It's help but only in delaying the problem.
For now what we have done is a batch file which run every 10 minutes and kills all inactive sessions that are still there after 30 minutes (ALTER SESSION KILL....IMMEDIATE).
That's where we are, we have enable the ODP.NET trace and waiting to look at it tomorrow, but by this time, what can cause this problem?
Finally what I found in the ODP.NET Trace was
TID: dac (ERROR) Oracle error code=28002; Oracle msg=ORA-28002: the password will expire within 6 days
Seems that this Error doesn't stop the access to the DB but hang the session in the pool.
After we fixed that, all was back to normal.

Windows Azure Issue - First request fails but second succeeds?

For the last 10 hours my website has been partially down due to being hosted in the West Europe region (an issue from the servers of Windows Azure).
Now they state that the servers are perfectly up and running again:
www.windowsazure.com/en-us/support/service-dashboard/
However my issue is that when I send a get request to my site I get a server error. But when I submit the same request quickly afterwards it loads just fine.
Anyone knows what the issue could be?
Azure SQL will close connection after a period of idle time, make sure you add retry machanism so when first connect to SQL failed, you try one more time.

Troubleshooting an IIS .NET website outage

Last night one of the websites (.NET 4.0 forms) hosted on my Win 2008 R2 (IIS 7.5) Server started to time out throwing the following error for all connected users.
TYPE System.Web.HttpException
MESSAGE Request timed out.
DETAIL System.Web.HttpException (0x80004005): Request timed out.
The outage was confined to just one website within IIS, the others continued to work fine.
Unfortunately I was unable to identify why the website was timing out. Here are the steps I took:
First thing I did was look at the task manager which revealed normal CPU and memory usage. Network activity was also moderate.
I then opened IIS to look at the live connections under 'Worker Processes'. There were about 60 live connections, so it didn't look like anything DDoS related.
Checked database connectivity (hosted on a separate server), all fine!
I then reset the website on IIS. That didn't work
I tried to then do a complete iisreset...still no luck :(
In the end (and under some duress) the only thing I could think to do to resolve this was to restart the server.
Restarting the server worked but I am nervous not knowing why this happened in the first place. Can anyone recommend any checks that I failed to carryout? Is there an official checklist for working through these sorts of IIS problems? I have reviewed the IIS logs but don't see anything unusual on the run up to the outage.
Any pointers or links to useful resources to help me understand and mitigate against this in future will be much appreciated.
EDIT
The only time I logged into the server that day was to add an additional web handler component (for remote deploy) to IIS Web Deploy. I'm doubtful this caused the outage as the server worked for for 6 hours after.
Because iisreset didn't helped and you had to restart whole machine, I would suspect it was a global resources shortage and mostly used website (or most resource consuming) was impacted. It could be because of not available RAM, network connections congestion due to some malfunctioning calls (for example a lot of CLOSE_WAIT sockets exhausting connections pool, we've seen that in production because of malfunction of external service). It could be also one specific client problem, which was disconnected after machine restart so eventually the problem disappeared.
I would start from:
Historical analysis
review Event Viewer to see any errors/warnings from that period of time,
although you have already looked into IIS logs, I would do it once again with help of Log Parser Lizard to make some statistics like number of request per client, network bandwith per client, average response time per client and so on.
Monitoring
continuously monitor Performance Counters:
\Processor(_Total_)\% Processor Time,
\.NET CLR Exceptions(_Global_)\# of Exceps Thrown / sec,
\Memory\Available MBytes,
\Web Service(Default Web Site)\Current Connections (per each your site name),
\ASP.NET v4.0.30319\Request Wait Time,
\ASP.NET v4.0.30319\Requests Current,
\ASP.NET v4.0.30319\Request Queued,
\Process(XXX)\Working Set,
\Process(XXX)\% Processor Time (XXX per each w3wp process),
\Network Interface(XXX)\Bytes total / sec
run Performance Analysis of Logs (PAL) Tool during time of failure to make a very detailed analysis of performance counters data,
run netstat -ano to analyze network traffic (or TCPView tool even better)
If all this will not lead you to any conclusion, create a Debug Diagnostic rule to create a memory dump of the process for long running requests and analyze it with WinDbg and PSSCor extension for .NET debugging.

w3wp.exe is restarting but all GET requests are eventually queued and serviced?

I have a w3wp.exe that is restarting on my IIS server (see specs below). Memory gradually climbs to ~3G then it randomly restarts itself about every 1-2min.
Memory Usage:
The odd thing is that once this memory drop (what looks like a restart - btw...the app pool does not get recycled/restarted) happens GET requests are queued but then serviced as soon as the service warms/starts up (causing a delay in responses to our clients - who were initially reporting delayed reponse times on occasion).
I have followed this link to get a stack dump once the .exe restarts (private bytes go to ~0) but nothing gets logged (no .dmp file) with diag debug once the service restarts.
I see tons of warnings in my webserver (IIS) log but that's it:
A process serving application pool 'MyApplication' suffered a fatal
communication error with the Windows Process Activation Service. The
process id was '1732'. The data field contains the error number.
ASK: I'm not sure if this is a memory limitation, if cacheing is not playing well with my threads/tasks, if cacheing is blowing up, if there is watchdog service restarting my application, etc,. Has anybody run across something similar with w3wp.exe restarting? It's hard to tell because diagdebug is not giving me a dump once it restarts.
SPECS:
MVC4 Web API servicing GET requests (code is debug build with debug=true)
Uses MemoryCache with Model and Business Objects with cache eviction set to 2hrs...uses
Task (TPS) for each new request.
Database: SQL Server 2008R2
Web Servers: Windows Server 2008R2 Enterprise SP1 (64bit, 64G RAM)
IIS 7.5
One application pool...no other LOB applications running on this server
Your first step is to reproduce the problem in your test environment. Setup some kind of load generation app (you can write it yourself pretty easily) and get the same problem happening. Then turn off debug in web.config and see if that fixes the issue. Then change it to be a release build and test again.
I've never used memorycache - try reducing the cache eviction time or just turn it off and see if that fixes the issue. Good luck :)

Resources