Deadlocks in SQL Server 2008 R2 - deadlock

Server: SQL Server 2008R2
Clients: Excel/ADO - 8 clients
Server hardware: 8 core/16GB Memory/OS:WinServer 2008SR2
Deadlocks happening on both Insert/Update and Merge/Matched stored procedures.
I have read much on here about insert/updating, and as a result, I have changed my Insert/Updates to Merge/Matched, but I am still getting VERY frequent Deadlock errors (about once every 10 minutes) from just 8 clients running in batch mode calling for updates at a rate of 2 per minute.
In contrast, each client inserts about 20,000 items per minute to another table with no issues at all.
I would love some assistance on solving these deadlock issues as I don't think such a measly 8 clients (especially Excel/ADO/VBA) should be able to stress out this DB!
Also note that I do not issue any SQL commands directly through the clients, all sql commands are called through stored procedures.
My current SP:
merge [dbo].[File_Level_Data] as TargetMerge
using(select #Name_of_File as name)as source
on (TargetMerge.Name_of_File = source.name)
when matched then
update
set
XXX1 = #XXX1,
ZZZ25 = #ZZZ25
when not matched then
insert
(XXX1,
ZZZ25
) values
(
#XXX1,
#ZZZ25
);

My deadlocks were being caused by a trigger that I had put in there years ago but didn't believe they were still there. Once I removed them, no more deadlocks.

Related

Cosmos DB Emulator hangs when pumping continuation token, segmented query

I have just added a new feature to an app I'm building. It uses the same working Cosmos/Table storage code that other features use to query and pump results segments from the Cosmos DB Emulator via the Tables API.
The emulator is running with:
/EnableTableEndpoint /PartitionCount=50
This is because I read that the emulator defaults to 5 unlimited containers and/or 25 limited and since this is a Tables API app, the table containers are created as unlimited.
The table being queried is the 6th to be created and contains just 1 document.
It either takes around 30 seconds to run a simple query and "trips" my Too Many Requests error handling/retry in the process, or hangs seemingly forever and no results are returned, the emulator has to be shut down.
My understanding is that with 50 partitions I can make 10 unlimited tables, collections since each is "worth" 5. See documentation.
I have tried with rate limiting on and off, and jacked the RU/s to 10,000 on the table. It always fails to query this one table. The data, including the files on disk, has been cleared many times.
It seems like a bug in the emulator. Note that the "Sorry..." error that I would expect to see upon creation of the 6th unlimited table, as per the docs, is never encountered.
After switching to a real Cosmos DB instance on Azure, this is looking like a problem with my dodgy code.
Confirmed: my dodgy code.
Stand down everyone. As you were.

Design advice on Process, parallelly large volume files

I am looking for design advise on below use case.
I am designing an application which can process EXCEL/CSV/JSON files. They all contain
same columns/attributes. There are about 72 columns/attributes. These files may contain up to 1 million records.
Now i have two options to process those files.
Option 1
Service 1: Read the content from given file, convert each row into JSON save the records into SQL table by batch processing (3k records per batch).
Service 2: Fetch those JSON records from database table (which are saved in step 1), process (validation and calculation) them and save final results into separate table.
Option 2 (using Rabbit MQ)
Service 1: Read the content from given file, and send every row as a message into Queue. Let say if file contains 1 million records then this service will be sending 1 million messages into Queue.
Service 2: Listen to Queue created in step 1, and process those messages (Validation and calculation) and save final results into separate table.
POC experience with Option 1:
It took 5 minutes to read and batch saving the data into table for 100K records. (job of service 1)
If application is trying to process multiple files parallelly which contain 200K records in each of them some times i am seeing deadlocks.
No indexes or relation ships are created on this batch processing table.
Saving 3000 records per batch to avoid table locks.
While services are processing, results are trackable and query the progress. Let say, For "File 1.JSON" - 50000 records are processed successfully and remaining 1000 are IN progress.
If Service 1 finish the job correctly and if something goes wrong with service 2 then we still have better control to reprocess those records as they are persisted in the database.
I am planning to delete the data in batch processing table with a nightly SQL job if all records are already processed by service 2 so this table will be fresh and ready to store the data for the next day processing.
POC experience with option 2:
To produce (service 1) and consume messages (service 2) for 100k record file it took around 2 hr 30 mins.
No storage of file data into the database so no deadlocks (like option 1)
Results are not trackable as much as option 1, while services are processing the records. - To share the status with clients who sent the file for processing.
We can see the status of messages on Rabbit MQ management screen for monitoring purpose.
If service 1 partially read the data from a given file and error out due to some issues then there is no chance of roll back already published messages in Rabbit MQ per my knowledge so consumer keep working on those published messages..
I can horizontally scale the application on both of these options to speed up the process.
Per above facts both options have advantages and disadvantages. Is it a good use case to use Rabbit MQ ? Is it advisable to produce and consume millions of records through RabbitMQ ? Is there a better way to deal with this use case apart from these 2 options.
Please advise.
*** I am using .Net Core 5.0 and SQL server 2019. Service 1 and Service 2 are .net core worker services (windows jobs). All tests are done on my local machine and Rabbit MQ is installed on Docker (docker is on my local machine).

How can I remove Host Instance Zombies from BTMessageBox

After moving most of our BT-Applications from BizTalk 2009 to BizTalk 2010 environment, we began the work to remove old applications and unused host. In this process we ended up with a zombie host instance.
This has resulted in that the bts_CleanupDeadProcesses startet to fail with error “Executed as user: RH\sqladmin. Could not find stored procedure 'dbo.int_ProcessCleanup_ProcessLabusHost'. [SQLSTATE 42000] (Error 2812). The step failed.”
After looking at the CleanupDeatProcess process, I found the zombie host instance found in the BTMsgBox.ProcessHeartBeats table, with dtNextHeartbeatTime set to the time when the host was removed.
(I'm assuming that the Host Instance Processes don't exist in your services any longer, and that the SQL Agent job fails)
From looking at the source of the [dbo].[bts_CleanupDeadProcesses] job, it loops through the dbo.ProcessHeartbeats table with a cursor (btsProcessCurse, lol) looking for 'dead' hearbeats.
Each process instance has its own cleanup sproc int_ProcessCleanup_[HostName] and a sproc for the heartbeat watchdog to call, viz bts_ProcessHeartbeat_[HostName] (although FWR the SPROC calls it #ApplicationName), filtered by WHERE (s.dtNextHeartbeatTime < #dtCurrentTime).
It is thus tempting to just delete the record for your deleted / zombie host (or, if you aren't that brave, to simply update the Next dtNextHeartbeatTime on the heartbeat record for your dead host instance to sometime next century). Either way, the SQL agent job should skip the dead instances.
An alternative could be to try and re-create the Host and Instances with the same name through the Admin Console, just to delete them (properly) again. This might however cause additional problems as BizTalk won't be able to create the 2 SPROCs above because of the undeleted objects.
However, I wouldn't obviously do this on your prod environment until you've confirmed this works with a trial run first.
It looks like someone else got stuck with a similar situation here
And there is also a good dive into the details of how the heartbeat mechanism works by XiaoDong Zhu here
Have you tried BTSTerminator? That works for one-off cleanups.
http://www.microsoft.com/en-us/download/details.aspx?id=2846

Busy on SELECT in SQLite with multiple connections

In my (test) application I have multiple threads where each thread makes multiple SELECT calls one after another. Each thread has its own connection to the (same) SQLite database.
Everything works fine for a couple of calls but then I start getting SQLITE_BUSY errors ("database is locked") back from sqlite3_step().
I know I can circumvent this with sqlite3_busy_timeout() but I want to know why this is happening or whether this is expected behavior.
Please note that I'm only doing read operations (unless SELECT is considered a write operation nowadays), so SQLite should only acquire a shared lock, shouldn't it?
I'm on Windows (7 x64) and running SQLite 3.7.13.

ASP.NET web app deadlocking - think it's caused by SQL Server locking

Our client's web app restarts suddenly at random intervals. For each restart, we've found an entry like this in the Windows Event Log:
Event Type: Warning
Event Source: W3SVC-WP
Event Category: None
Event ID: 2262
Date: 2/21/2010
Time: 1:33:52 PM
User: N/A
Computer: LIQUID-NXCFZ9DJ
Description:
ISAPI 'c:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\aspnet_isapi.dll' reported itself as unhealthy for the following reason: 'Deadlock detected'.
This has happened 10 times in 3 weeks, several of those being 2 or 3 times in several hours, and also going over a week without it happening.
In the crash dump that we have maybe 70-80 client connections, like so:
GET request for <path here>
Mapped To URL <mapped path>
HTTP Version HTTP/1.1
SSL Request False
Time alive 00:55:24
QueryString <query string here>
Request mapped to
HTTP Request State HTR_READING_CLIENT_REQUEST
Native Request State NREQ_STATE_PROCESS
(that's 55 minutes!!! there's no reason a client connection should be around that long)
Relevant entries in machine.config:
<system.net>
<connectionManagement>
<add address="*" maxconnection="200" />
</connectionManagement>
</system.net>
and (inside ):
<deployment retail="true" />
<!--<customErrors mode="Off"/>-->
<processModel autoConfig="true"
memoryLimit="60"
maxIoThreads="200"
minIoThreads="30"
minWorkerThreads="40"
maxWorkerThreads="200"
clientConnectedCheck="00:00:05" />
<httpRuntime
minFreeThreads="20"
minLocalRequestFreeThreads="10"
enableKernelOutputCache="false"
maxRequestLength="10240" />
This latest time we were able to look at it as it was happening, and saw about 20 queries all in 'suspended' status in Sql Server. It looked like they could have all been related to one table (the Items table, a very central one for lots of different operations).
We weren't sure what the best thing to do was in the middle of the problem. When the crash occurred, Sql Server cleared out.
Any guidance on what's going on, or how to find out what's going on, would be much appreciated.
If it's a deadlock, it means is a deadlock that has a loop that completes outside SQL. Meaning you are trying to acquire process resources (ie. C# 'lock') while holding SQL resources (ie. a transaction). To give an example houw this can happen consider the following scenario:
T1 starts a SQL transaction and updates a table A in SQL
T2 locks an object in C#
T1 tries to lock the same object in C#, blocks on T2's lock
T2 reads from SQL table A, blocks on T1's update
T1 waits on T2 inside your process, T2 waits for T1 inside SQL, undetectable deadlock
Situations like this cannot be detected inside SQL's deadlock monitoring, since the deadlock loop completes outside SQL. How would you diagnose such a problem? For the SQL server side of the loop you have a lot of powerful tools at your disposal, primarily sys.dm_exec_requests which can tell you which requests are blocked by what. But unfortunately on the app size of the loop there is no out-of-the-box instrumentation, so you are on your own. An experienced eye can detect the problem on code inspection (doing SQL calls while holding C# locks or acquiring C# locks in the middle of SQL transactions are a big give away), otherwise you have to either exercise some masterful WinDbg-fu, or instrument the code.
You should also consider that this is not a deadlock at all. You can have your 20 SQL requests blocked by an ordinary code defect in your application, like a transaction leak on certain requests (ie. the requests wait for a transaction that blocks them to commit, but that transaction has leaked in code and will never be closed). Again, sys.dm_exec_requests is your friend.
Check the running processes in SQL server using the Activity monitor.
UPDATE: I saw that this specific error is probably not SQL. I found this article on how to generate more info on the deadlock: http://support.microsoft.com/?ID=828222

Resources