ASP.NET web app deadlocking - think it's caused by SQL Server locking - asp.net

Our client's web app restarts suddenly at random intervals. For each restart, we've found an entry like this in the Windows Event Log:
Event Type: Warning
Event Source: W3SVC-WP
Event Category: None
Event ID: 2262
Date: 2/21/2010
Time: 1:33:52 PM
User: N/A
Computer: LIQUID-NXCFZ9DJ
Description:
ISAPI 'c:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\aspnet_isapi.dll' reported itself as unhealthy for the following reason: 'Deadlock detected'.
This has happened 10 times in 3 weeks, several of those being 2 or 3 times in several hours, and also going over a week without it happening.
In the crash dump that we have maybe 70-80 client connections, like so:
GET request for <path here>
Mapped To URL <mapped path>
HTTP Version HTTP/1.1
SSL Request False
Time alive 00:55:24
QueryString <query string here>
Request mapped to
HTTP Request State HTR_READING_CLIENT_REQUEST
Native Request State NREQ_STATE_PROCESS
(that's 55 minutes!!! there's no reason a client connection should be around that long)
Relevant entries in machine.config:
<system.net>
<connectionManagement>
<add address="*" maxconnection="200" />
</connectionManagement>
</system.net>
and (inside ):
<deployment retail="true" />
<!--<customErrors mode="Off"/>-->
<processModel autoConfig="true"
memoryLimit="60"
maxIoThreads="200"
minIoThreads="30"
minWorkerThreads="40"
maxWorkerThreads="200"
clientConnectedCheck="00:00:05" />
<httpRuntime
minFreeThreads="20"
minLocalRequestFreeThreads="10"
enableKernelOutputCache="false"
maxRequestLength="10240" />
This latest time we were able to look at it as it was happening, and saw about 20 queries all in 'suspended' status in Sql Server. It looked like they could have all been related to one table (the Items table, a very central one for lots of different operations).
We weren't sure what the best thing to do was in the middle of the problem. When the crash occurred, Sql Server cleared out.
Any guidance on what's going on, or how to find out what's going on, would be much appreciated.

If it's a deadlock, it means is a deadlock that has a loop that completes outside SQL. Meaning you are trying to acquire process resources (ie. C# 'lock') while holding SQL resources (ie. a transaction). To give an example houw this can happen consider the following scenario:
T1 starts a SQL transaction and updates a table A in SQL
T2 locks an object in C#
T1 tries to lock the same object in C#, blocks on T2's lock
T2 reads from SQL table A, blocks on T1's update
T1 waits on T2 inside your process, T2 waits for T1 inside SQL, undetectable deadlock
Situations like this cannot be detected inside SQL's deadlock monitoring, since the deadlock loop completes outside SQL. How would you diagnose such a problem? For the SQL server side of the loop you have a lot of powerful tools at your disposal, primarily sys.dm_exec_requests which can tell you which requests are blocked by what. But unfortunately on the app size of the loop there is no out-of-the-box instrumentation, so you are on your own. An experienced eye can detect the problem on code inspection (doing SQL calls while holding C# locks or acquiring C# locks in the middle of SQL transactions are a big give away), otherwise you have to either exercise some masterful WinDbg-fu, or instrument the code.
You should also consider that this is not a deadlock at all. You can have your 20 SQL requests blocked by an ordinary code defect in your application, like a transaction leak on certain requests (ie. the requests wait for a transaction that blocks them to commit, but that transaction has leaked in code and will never be closed). Again, sys.dm_exec_requests is your friend.

Check the running processes in SQL server using the Activity monitor.
UPDATE: I saw that this specific error is probably not SQL. I found this article on how to generate more info on the deadlock: http://support.microsoft.com/?ID=828222

Related

IIS hung requests - can't see CLR stacktraces in memory dump

ASP.NET WebAPI2 application on .NET 4.6.2, hosted on IIS on Windows Server 2016. From time to time, there is a lot (hundreds) of requests stuck for hours (despite the fact I have request timeout 60s set) with no CPU usage. So, I took the memory dump of w3wp process, along with sos.dll, clr.dll and mscordacwks.dll and all my project's dlls and pdbs from bin directory from server and used WinDbg as described in many blogs and tutorials. But, in all of them, they are able to directly see CLR stack by calling ~*e !clrstack. I can see CLR stacktrace for some Redis and ApplicationInsights workers, but for all other managed threads I can see only:
OS Thread Id: 0x1124 (3)
Child SP IP Call Site
GetFrameContext failed: 1
0000000000000000 0000000000000000
!dumpstack for any of these gives just this:
0:181> !dumpstack
OS Thread Id: 0x1754 (181)
Current frame: ntdll!NtWaitForSingleObject+0x14
Child-SP RetAddr Caller, Callee
000000b942c7f6a0 00007fff33d63acf KERNELBASE!WaitForSingleObjectEx+0x8f, calling ntdll!NtWaitForSingleObject
000000b942c7f740 00007fff253377a6 clr!CLRSemaphore::Wait+0x8a, calling kernel32!WaitForSingleObjectEx
000000b942c7f7b0 00007fff25335331 clr!GCCoop::GCCoop+0xe, calling clr!GetThread
000000b942c7f800 00007fff25337916 clr!ThreadpoolMgr::UnfairSemaphore::Wait+0xf1, calling clr!CLRSemaphore::Wait
000000b942c7f840 00007fff253378b1 clr!ThreadpoolMgr::WorkerThreadStart+0x2d1, calling clr!ThreadpoolMgr::UnfairSemaphore::Wait
000000b942c7f8e0 00007fff253d952f clr!Thread::intermediateThreadProc+0x86
000000b942c7f9e0 00007fff253d950f clr!Thread::intermediateThreadProc+0x66, calling clr!_chkstk
000000b942c7fa20 00007fff37568364 kernel32!BaseThreadInitThunk+0x14, calling ntdll!LdrpDispatchUserCallTarget
000000b942c7fa50 00007fff3773e821 ntdll!RtlUserThreadStart+0x21, calling ntdll!LdrpDispatchUserCallTarget
So I have no idea, where to look for bug in my code.
(here is the full result:
https://gist.github.com/rouen-sk/eff11844557521de367fa9182cb94a82
and here is the results of !threads:
https://gist.github.com/rouen-sk/b61cba97a4d8300c08d6a8808c4bff6e)
What can I do? Google search for GetFrameContext failed gives nothing helpful.
As mentioned, this is not trivial, however you can find a case study of similar problem here: https://blogs.msdn.microsoft.com/rodneyviana/2015/03/27/the-case-of-the-non-responsive-mvc-web-application/
In a nutshell:
Download NetExt. It is the zip file here:
https://github.com/rodneyviana/netext/tree/master/Binaries
Open your dump and load NetExt
Run !windex to index the heap
Run !whttp -order -running to see a list of running requests
If the requests contains thread number you can go to the thread to see what is happening
If the requests contains --- instead of thread number, they are waiting a thread and this is a sign that some throttling is happening
If it is a WCF service, run !wservice to see the services
Run !wruntime to see runtime information
Run !wapppool to see Application Pool information
Run !wdae to list all errors
... And so it goes. When you do this again and again you will be able to spot issues easily

BizTalk 2006 Event Log Warnings - Cannot insert duplicate key row in object 'dta_MessageFieldValues' with unique index 'IX_MessageFieldValues'

We have been seeing the following 'warnings' in the event log of our BizTalk
machine since upgrading to BTS 2006. They seem to occur
randomly 6 or 8 times per day.
Does anyone know what this means and what needs to be done to clear it up?
we have only one BizTalk server which is running on only one machine.
I am new to BizTalk, so I am unable to find how many tracking host instances running for BizTalk server. Also, can you please let me know that we can configure only one instance for one server/machine?
Source: BAM EventBus Service
Event: 5
Warning Details:
Execute batch error. Exception information: TDDS failed to batch execution
of streams. SQLServer: bizprod, Database: BizTalkDTADb.Cannot insert
duplicate key row in object 'dta_MessageFieldValues' with unique index
'IX_MessageFieldValues'.
The statement has been terminated..
I see you got a partial answer in your MSDN Post
go to BizTalk Admin Console ,check in Platform Settings -> Hosts, in the list of hosts on the right, confirm that only a single Host has the Tracking column marked as Yes.
As to your other question. Yes you can run a Single Host Instance on a Single Server. Although when your server starts to come under a bit of load you may want to consider setting up some more so you can balance the workload better.

run oracle reports from pl/sql procedure

I'm refactoring a very old reports generating function in an ORACLE web application. It used JavaScript to construct a URL, sending to the reports server to run a reports. What I want to do is processing it in the database, in PL/SQL procedures(invoked through mod_pl/SQL). I tried to use utl_http.begin_request to do that, but sometimes when the output file is large(PDF format, about 20 pages, 1.5M), I received an error:
ORA-29259: end-of-input reached.
The test codes for sending requests are quite simple:
--------upgraded 2013/08/27----------------------------------------------------------
UTL_HTTP.set_transfer_timeout(1000);
--some params setting....
myIdent := SRW.RUN_REPORT(myPlist); --here raise the exception(ORA-29273:request_failed; ORA-29259: end-of-input reached.) and procedure stoped.
r_stat := SRW.report_status(myIdent,myPlist);
#ThinkJet, Thanks for your help.
I logon the report server, found that the report still running after I got this exception in my program, and finally finished succefully.I tested for many times, and found everytime I got the exception just 5 minutes after I started request,no matter what kind of reports I was running and what size it was (sure, all big size,running for over 5mins).I'm wondering if it's something about configuration on oracle application server?
Does anyone have idea about this? Many Thanks.
ORA-29259 while SRW.RUN_REPORT
is ias 10g , this is cause by httpd.conf timeout parameter configuration
just
1.edit $ORACLE_HOME /Apache/Apache/conf/httpd.conf
change "Timeout" great than your report running time
2.restart http server and test it.
opmnctl restartproc ias-component=HTTP_Server
It take a time on Oracle Reports side to produce a big reports so a timeouts can occur sometime. Try to increase timeout with utl_http.set_ransfer_timeout procedure before originate a request and look if it helps.

How can I remove Host Instance Zombies from BTMessageBox

After moving most of our BT-Applications from BizTalk 2009 to BizTalk 2010 environment, we began the work to remove old applications and unused host. In this process we ended up with a zombie host instance.
This has resulted in that the bts_CleanupDeadProcesses startet to fail with error “Executed as user: RH\sqladmin. Could not find stored procedure 'dbo.int_ProcessCleanup_ProcessLabusHost'. [SQLSTATE 42000] (Error 2812). The step failed.”
After looking at the CleanupDeatProcess process, I found the zombie host instance found in the BTMsgBox.ProcessHeartBeats table, with dtNextHeartbeatTime set to the time when the host was removed.
(I'm assuming that the Host Instance Processes don't exist in your services any longer, and that the SQL Agent job fails)
From looking at the source of the [dbo].[bts_CleanupDeadProcesses] job, it loops through the dbo.ProcessHeartbeats table with a cursor (btsProcessCurse, lol) looking for 'dead' hearbeats.
Each process instance has its own cleanup sproc int_ProcessCleanup_[HostName] and a sproc for the heartbeat watchdog to call, viz bts_ProcessHeartbeat_[HostName] (although FWR the SPROC calls it #ApplicationName), filtered by WHERE (s.dtNextHeartbeatTime < #dtCurrentTime).
It is thus tempting to just delete the record for your deleted / zombie host (or, if you aren't that brave, to simply update the Next dtNextHeartbeatTime on the heartbeat record for your dead host instance to sometime next century). Either way, the SQL agent job should skip the dead instances.
An alternative could be to try and re-create the Host and Instances with the same name through the Admin Console, just to delete them (properly) again. This might however cause additional problems as BizTalk won't be able to create the 2 SPROCs above because of the undeleted objects.
However, I wouldn't obviously do this on your prod environment until you've confirmed this works with a trial run first.
It looks like someone else got stuck with a similar situation here
And there is also a good dive into the details of how the heartbeat mechanism works by XiaoDong Zhu here
Have you tried BTSTerminator? That works for one-off cleanups.
http://www.microsoft.com/en-us/download/details.aspx?id=2846

Getting error "Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding."

My site is running under IIS correctly. But after working for long time. It start giving the timeout error, seems like sever busy in doing other work. But the SQL server is running in my local host with no server usage/load except of the current application.
"Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding."
After doing the system restart only i can resume my work. Need help on this. Thanks in advance
It sounds like you're hitting the execution timeout. If you're reaching this threshold, you may want to do some profiling to see if there's a performance bottleneck somewhere. To work around this issue, you can specify an execution timeout in the web.config:
<httpRuntime executionTimeout="180" /> <!-- in seconds -->
To change the execution timeout in SQL Server go to the server properties:
http://img197.imageshack.us/img197/3152/srvrprops.png
See here for more details:
Changing the CommandTimeout in SQL Management studio
I have used the recompile to solve this issue.
EXEC sp_recompile N'myproc'
Finally resolved this issue by clearing out the execution plan from the procedure cache.
To clear the plan cache run the following scripts...
SELECT plan_handle, st.text
FROM sys.dm_exec_cached_plans
CROSS APPLY sys.dm_exec_sql_text(plan_handle) AS st
where objtype = 'Proc' AND st.[text] like '%[Enter stored procedure name here]%'
If the stored procedure exists in the cache you're results should look like this:
From the results returned replace the plan_handle in the example below...
-- Remove the specific execution plan from the cache.
DBCC FREEPROCCACHE (0x050017005CBF201940A1CE91000000000000000000000000);
GO
More information here

Resources