How commonly do deadlock issues occur in programming? - deadlock

I've programmed in a number of languages, but I am not aware of deadlocks in my code.
I took this to mean it doesn't happen.
Does this happen frequently (in programming, not in the databases) enough that I should be concerned about it?

Deadlocks could arise if two conditions are true: you have mutilple theads, and they contend for more than one resource.
Do you write multi-threaded code? You might do this explicitly by starting your own threads, or you might work in a framework where the threads are created out of your sight, and so you're running in more than one thread without you seeing that in your code.
An example: the Java Servlet API. You write a servlet or JSP. You deploy to the app server. Several users hit your web site, and hence your servlet. The server will likely have a thread per user.
Now consider what happens if in servicing the requests you want to aquire some resources:
if ( user Is Important ){
getResourceA();
}
getResourceB();
if (today is Thursday ) {
getResourceA();
}
// some more code
releaseResourceA();
releaseResoruceB();
In the contrived example above, think about what might happen on a Thursday when an important user's request arrives, and more or less simultaneously an unimportant user's request arrives.
The important user's thread gets Resoruce A and wants B. The less important user gets resource B and wants A. Neither will let go of the resource that they already own ... deadlock.
This can actually happen quite easily if you are writing code that explicitly uses synchronization. Most commonly I see it happen when using databases, and fortunately databases usually have deadlock detection so we can find out what error we made.
Defense against deadlock:
Acquire resources in a well defined order. In the aboce example, if resource A was always obtained before resource B no deadlock would occur.
If possible use timeouts, so that you don't wait indefinately for a resource. This will allow you to detect contention and apply defense 1.

It would be very hard to give an idea of how often it happens in reality (in production code? in development?) and that wouldn't really give a good idea of how much code is vulnerable to it anyway. (Quite often a deadlock will only occur in very specific situations.)
I've seen a few occurrences, although the most recent one I saw was in an Oracle driver (not in the database at all) due to a finalizer running at the same time as another thread trying to grab a connection. Fortunately I found another bug which let me avoid the finalizer running in the first place...
Basically deadlock is almost always due to trying to acquire one lock (B) whilst holding another one (A) while another thread does exactly the same thing the other way round. If one thread is waiting for B to be released, and the thread holding B is waiting for A to be released, neither is willing to let the other proceed.
Make sure you always acquire locks in the same order (and release them in the reverse order) and you should be able to avoid deadlock in most cases.
There are some odd cases where you don't directly have two locks, but it's the same basic principle. For example, in .NET you might use Control.Invoke from a worker thread in order to update the UI on the UI thread. Now Invoke waits until the update has been processed before continuing. Suppose your background thread holds a lock with the update requires... again, the worker thread is waiting for the UI thread, but the UI thread can't proceed because the worker thread holds the lock. Deadlock again.
This is the sort of pattern to watch out for. If you make sure you only lock where you need to, lock for as short a period as you can get away with, and document the thread safety and locking policies of all your code, you should be able to avoid deadlock. Like all threading topics, however, it's easier said than done.

If you get a chance take a look at first few chapters in Java Concurrency in Practice.
Deadlocks can occur in any concurrent programming situation, so it depends how much concurrency you deal with. Several examples of concurrent programming are: multi-process, multi-thread, and libraries introducing multi-thread. UI frameworks, event handling (such as timer event) could be implemented as threads. Web frameworks could spawn threads to handle multiple web requests simultaneously. With multicore CPUs you might see more concurrent situations visibly than before.
If A is waiting for B, and B is waiting for A, the circular wait causes the deadlock. So, it also depends on the type of code you write as well. If you use distributed transactions, you can easily cause that type of scenario. Without distributed transactions, you risk bank accounts from stealing money.

All depends on what you are coding. Traditional single threaded applications that do not use locking. Not really.
Multi-threaded code with multiple locks is what will cause deadlocks.
I just finished refactoring code that used seven different locks without proper exception handling. That had numerous deadlock issues.

A common cause of deadlocks is when you have different threads (or processes) acquire a set of resources in different order.
E.g. if you have some resource A and B, if thread 1 acquires A and then B, and thread 2 acquires B and then A, then this is a deadlock waiting to happen.
There's a simple solution to this problem: have all your threads always acquire resources in the same order. E.g. if all your threads acquire A and B in that order, you will avoid deadlock.

A deadlock is a situation with two processes are dependent on each other - one cannot finish before the other. Therefore, you will likely only have a deadlock in your code if you are running multiple code flows at any one time.
Developing a multi-threaded application means you need to consider deadlocks. A single-threaded application is unlikely to have deadlocks - but not impossible, the obvious example being that you may be using a DB which is subject to deadlocking.

Related

How to kill a thread, stop a promise execution in Raku

I am looking for a wait to stop (send an exception) to a running promise on SIGINT. The examples given in the doc exit the whole process and not just one worker.
Does someone know how to "kill", "unschedule", "stop" a running thread ?
This is for a p6-jupyter-kernel issue or this REPL issue.
Current solution is restarting the repl but not killing the blocked thread
await Promise.anyof(
start {
ENTER $running = True;
LEAVE $running = False;
CATCH {
say $_;
reset;
}
$output :=
self.repl-eval($code,:outer_ctx($!save_ctx),|%adverbs);
},
$ctrl-c
);
Short version: don't use threads for this, use processes. Killing the running process probably is the best thing that can be achieved in this situation in general.
Long answer: first, it's helpful to clear up a little confusion in the question.
First of all, there's no such thing as a "running Promise"; a Promise is a data structure for conveying a result of an asynchronous operation. A start block is really doing three things:
Creating a Promise (which it evaluates to)
Scheduling some code to run
Arranging that the outcome of running that code is reflected by keeping or breaking the Promise
That may sound a little academic, but really matters: a Promise has no awareness of what will ultimately end up keeping or breaking it.
Second, a start block is not - at least with the built-in scheduler - backed by a thread, but rather runs on the thread pool. Even if you could figure out a way to "take out" the thread, the thread pool scheduler is not going to be happy with having one of the threads it expects to eat from the work queue on disappear. You could write your own scheduler that really does back work with a fresh thread each time, but that still isn't a complete solution: what if the piece of code the user has requested execution of schedules work of its own, and then awaits that? Then there is no one thread to kill to really bring things to a halt.
Let's assume, however, that we did manage to solve all of this, and we get ourselves a list of one or more threads that we really want to kill without their cooperation (cooperative situations are fairly easy; we use a Promise and have code poll that every so often and die if that cancellation Promise is ever kept/broken).
Any such mechanism that wants to be able to stop a thread blocked on anything (not just compute, but also I/O, locking, etc.) would need deep integration and cooperation from the underlying runtime (such as MoarVM). For example, trying to cancel a thread that is currently performing garbage collection will be a disaster (most likely deadlocking the VM as a whole). Other unfortunate cancellation times could lead to memory corruption if it was half way through an operation that is not safe to interrupt, deadlocks elsewhere if the killed thread was holding locks, and so forth. Thus one would need some kind of safe-pointing mechanism. (We already have things along those lines in MoarVM to know when it's safe to GC, however cancellation implies different demands. It probably cross-cuts numerous parts of the VM codebase.)
And that's not all: the same situation repeats at the Raku language level too. Lock::Async, for example, is not a kind of lock that the underlying runtime is aware of. Probably the best one can do is try to tear down the callstack and run all of the LEAVE phasers; that way there's some hope (if folks used the .protect method; if they just called lock and unlock explicitly, we're done for). But even if we manage not to leak resources (already a big ask), we still don't know - in general - if the code we killed has left the world in any kind of consistent state. In a REPL context this could lead to dubious outcomes in follow-up executions that access the same global state. That's probably annoying, but what really frightens me is folks using such a cancellation mechanism in a production system - which they will if we implement it.
So, effectively, implementing such a feature would entail doing a significant amount of difficult work on the runtime and Rakudo itself, and the result would be a huge footgun (I've not even enumerated all the things that could go wrong, just the first few that came to mind). By contrast, killing a process clears up all resources, and a process has its own memory space, so there's no consistency worries either.
There is currently no way to stop a thread if it doesn't want to be stopped.
A thread can check a flag every so often, and decide to call it quits if that flag is set. It would be very nice if we would have a way to throw an exception inside a thread from another thread. But we do not, at least not as far as I know.

Using 'Lock' in web applications

A few months ago I was interviewing for a job inside the company I am currently in, I dont have a strong web development background, but one of the questions he posed to me was how could you improve this block of code.
I dont remember the code block perfectly but to sum it up it was a web hit counter, and he used lock on the hitcounter.
lock(HitCounter)
{
// Bla...
}
However after some discussion he said, lock is good but never use it in web applications!
What is the basis behind his statement? Why shouldnt I use lock in web applications?
There is no special reason why locks should not be used in web applications. However, they should be used carefully as they are a mechanism to serialize multi-threaded access which can cause blocking if lock blocks are contended. This is not just a concern for web applications though.
What is always worth remembering is that on modern hardware an uncontended lock takes 20 nanoseconds to flip. With this in mind, the usual practice of trying to make code inside of lock blocks as minimal as possible should be followed. If you have minimal code within a block, the overhead is quite small and potential for contention low.
To say that locks should never be used is a bit of a blanket statement really. It really depends on what your requirements are e.g. a thread-safe in-memory cache to be shared between requests will potentially result in less request blocking than on-demand fetching from a database.
Finally, BCL and ASP.Net Framework types certainly use locks internally, so you're indirectly using them anyway.
The application domain might be recycled.
This might result in the old appdomain still finishing serving some requests and the new appdomain also serving new requests.
Static variables are not shared between them, so locking on a static global would not grant exclusivity in this case.
First of all, you never want to lock an object that you actually use in any application. You want to create a lock object and lock that:
private readonly object _hitCounterLock = new object();
lock(_hitCounterLock)
{
//blah
}
As for the web portion of the question, when you lock you block every thread that attempts to access the object (which for the web could be hundreds or thousands of users). They will all be waiting until each thread ahead of them unlocks.
Late :), but for future readers of this, an additional point:
If the application is run on a web farm, the ASP's running on multiple machines will not share the lock object
So this can only work if
1. No web farm has to be supported AND 2. ASP is configured (non-default) NOT to use parallel instances during recycle until old requests are served (as mentioned by Andras above)
This code will create a bottleneck for your application since all incoming request will have to wait at this point before the previous went out of the lock.
lock is only intended to be used for multithreaded applications where multiple threads require access to the same shared variable, thus a lock is exclusively acquired by the requesting thread and all pending threads will block and wait until the lock is released.
in web applications, user requests are isolated so there is no need for locking by default
Couple reasons...
If you're trying to lock a database read/write operation, there's a really high risk of a race condition happening anyway because the database isn't owned by the process doing the lock, so it could be read from/written to by another process -- perhaps even a hypothetical future version of IIS that runs multiple processes per application.
Locks are typically used in client applications for non-UI threads, i.e. background/worker threads. Web applications don't have as much of a use for multithreaded processing unless you're trying to take advantage of multiple cores (in which case locks on request-associated objects would be acceptable), because each request can be assumed to run on its own thread, and the server can't respond until it's processed the entire output (or at least a sequential chunk) anyway.

Is there a safe way to use ReaderWriterLockSlim within ASP.NET?

Joe Duffy's article about ReaderWriterLockSlim does not fill me with confidence!
Introducing the new ReaderWriterLockSlim in Orcas
The lock is not robust to asynchronous exceptions such as thread aborts and out of memory conditions. If one of these occurs while in the middle of one of the lock’s methods, the lock state can be corrupt, causing subsequent deadlocks, unhandled exceptions, and (sadly) due to the use of spin locks internally, a pegged 100% CPU.
How can I safely use ReaderWriterLockSlim in ASP.NET?
Is your ASP.NET application regularly encountering thread aborts (from other threads) or trying to survive OutOfMemoryExceptions? If not, I can't see that the post is too worrying... and if it is, I'd argue you've got bigger problems.
In particular, note this bit:
There are some downsides to the new lock, however, that may cause programmers writing hosted
or low-level reliability-sensitive code to wait to adopt it.
Don’t get me wrong, most people really don’t need to worry about these topics, so I
apologize if my words of warning have scared you off: but those that do really need to be
told about the state of affairs.
Now yes, ASP.NET is "hosted", but it's not quite as severe as the SQL Server CLR hosting. I don't think you need to worry.

Long-running ASP.NET tasks

I know there's a bunch of APIs out there that do this, but I also know that the hosting environment (being ASP.NET) puts restrictions on what you can reliably do in a separate thread.
I could be completely wrong, so please correct me if I am, this is however what I think I know.
A request typically timeouts after 120 seconds (this is configurable) but eventually the ASP.NET runtime will kill a request that's taking too long to complete.
The hosting environment, typically IIS, employs process recycling and can at any point decide to recycle your app. When this happens all threads are aborted and the app restarts. I'm however not sure how aggressive it is, it would be kind of stupid to assume that it would abort a normal ongoing HTTP request but I would expect it to abort a thread because it doesn't know anything about the unit of work of a thread.
If you had to create a programming model that easily and reliably and theoretically put a long running task, that would have to run for days, how would you accomplish this from within an ASP.NET application?
The following are my thoughts on the issue:
I've been thinking a long the line of hosting a WCF service in a win32 service. And talk to the service through WCF. This is however not very practical, because the only reason I would choose to do so, is to send tasks (units of work) from several different web apps. I'd then eventually ask the service for status updates and act accordingly. My biggest concern with this is that it would NOT be a particular great experience if I had to deploy every task to the service for it to be able to execute some instructions. There's also this issue of input, how would I feed this service with data if I had a large data set and needed to chew through it?
What I typically do right now is this
SELECT TOP 10 *
FROM WorkItem WITH (ROWLOCK, UPDLOCK, READPAST)
WHERE WorkCompleted IS NULL
It allows me to use a SQL Server database as a work queue and periodically poll the database with this query for work. If the work item completed with success, I mark it as done and proceed until there's nothing more to do. What I don't like is that I could theoretically be interrupted at any point and if I'm in-between success and marking it as done, I could end up processing the same work item twice. I might be a bit paranoid and this might be all fine but as I understand it there's no guarantee that that won't happen...
I know there's been similar questions on SO before but non really answers with a definitive answer. This is a really common thing, yet the ASP.NET hosting environment is ill equipped to handle long-running work.
Please share your thoughts.
Have a look at NServiceBus
NServiceBus is an open source
communications framework for .NET with
build in support for publish/subscribe
and long-running processes.
It is a technology build upon MSMQ, which means that your messages don't get lost since they are persisted to disk. Nevertheless the Framework has an impressive performance and an intuitive API.
John,
I agree that ASP.NET is not suitable for Async tasks as you have described them, nor should it be. It is designed as a web hosting platform, not a back of house processor.
We have had similar situations in the past and we have used a solution similar to what you have described. In summary, keep your WCF service under ASP.NET, use a "Queue" table with a Windows Service as the "QueueProcessor". The client should poll to see if work is done (or use messaging to notify the client).
We used a table that contained the process and it's information (eg InvoicingRun). On that table was a status (Pending, Running, Completed, Failed). The client would submit a new InvoicingRun with a status of Pending. A Windows service (the processor) would poll the database to get any runs that in the pending stage (you could also use SQL Notification so you don't need to poll. If a pending run was found, it would move it to running, do the processing and then move it to completed/failed.
In the case where the process failed fatally (eg DB down, process killed), the run would be left in a running state, and human intervention was required. If the process failed in an non-fatal state (exception, error), the process would be moved to failed, and you can choose to retry or have human intervantion.
If there were multiple processors, the first one to move it to a running state got that job. You can use this method to prevent the job being run twice. Alternate is to do the select then update to running under a transaction. Make sure either of these outside a transaction larger transaction. Sample (rough) SQL:
UPDATE InvoicingRun
SET Status = 2 -- Running
WHERE ID = 1
AND Status = 1 -- Pending
IF ##RowCount = 0
SELECT Cast(0 as bit)
ELSE
SELECT Cast(1 as bit)
Rob
Use a simple background tasks / jobs framework like Hangfire and apply these best practice principals to the design of the rest of your solution:
Keep all actions as small as possible; to achieve this, you should-
Divide long running jobs into batches and queue them (in a Hangfire queue or on a bus of another sort)
Make sure your small jobs (batched parts of long jobs) are idempotent (have all the context they need to run in any order). This way you don't have to use a quete which maintains a sequence; because then you can
Parallelise the execution of jobs in your queue depending on how many nodes you have in your web server farm. You can even control how much load this subjects your farm to (as a trade off to servicing web requests). This ensures that you complete the whole job (all batches) as fast and as efficiently as possible, while not compromising your cluster from servicing web clients.
Have thought about the use the Workflow Foundation instead of your custom implementation? It also allows you to persist states. Tasks could be defined as workflows in this case.
Just some thoughts...
Michael

What are common reasons for deadlocks?

Deadlocks are hard to find and very uncomfortable to remove.
How can I find error sources for deadlocks in my code? Are there any "deadlock patterns"?
In my special case, it deals with databases, but this question is open for every deadlock.
Update: This recent MSDN article, Tools And Techniques to Identify Concurrency Issues, might also be of interest
Stephen Toub in the MSDN article Deadlock monitor states the following four conditions necessary for deadlocks to occur:
A limited number of a particular resource. In the case of a monitor in C# (what you use when you employ the lock keyword), this limited number is one, since a monitor is a mutual-exclusion lock (meaning only one thread can own a monitor at a time).
The ability to hold one resource and request another. In C#, this is akin to locking on one object and then locking on another before releasing the first lock, for example:
lock(a)
{
...
lock(b)
{
...
}
}
No preemption capability. In C#, this means that one thread can't force another thread to release a lock.
A circular wait condition. This means that there is a cycle of threads, each of which is waiting for the next to release a resource before it can continue.
He goes on to explain that the way to avoid deadlocks is to avoid (or thwart) condition four.
Joe Duffy discusses several techniques
for avoiding and detecting deadlocks,
including one known as lock leveling.
In lock leveling, locks are assigned
numerical values, and threads must
only acquire locks that have higher
numbers than locks they have already
acquired. This prevents the
possibility of a cycle. It's also
frequently difficult to do well in a
typical software application today,
and a failure to follow lock leveling
on every lock acquisition invites
deadlock.
The classic deadlock scenario is A is holding lock X and wants to acquire lock Y, while B is holding lock Y and wants to acquire lock X. Since neither can complete what they are trying to do both will end up waiting forever (unless timeouts are used).
In this case a deadlock can be avoided if A and B acquire the locks in the same order.
No deadlock patterns to my knowledge (and 12 years of writing heavily multithreaded trading applications).. But the TimedLock class has been of great help in finding deadlocks that exist in code without massive rework.
http://www.randomtree.org/eric/techblog/archives/2004/10/multithreading_is_hard.html
basically, (in dotnet/c#) you search/replace all your "lock(xxx)" statements with "using TimedLock.Lock(xxx)"
If a deadlock is ever detected (lock unable to be obtained within the specified timeout, defaults to 10 seconds), then an exception is thrown. My local version also immediately logs the stacktrace. Walk up the stacktrace (preferably debug build with line numbers) and you'll immediately see what locks were held at the point of failure, and which one it was attempting to get.
In dotnet 1.1, in a deadlock situation as described, as luck would have it all the threads which were locked would throw the exception at the same time. So you'd get 2+ stacktraces, and all the information necessary to fix the problem. (2.0+ may have changed the threading model internally enough to not be this lucky, I'm not sure)
Making sure all transactions affect tables in the same order is the key to avoiding the most common of deadlocks.
For example:
Transaction A
UPDATE Table A SET Foo = 'Bar'
UPDATE Table B SET Bar = 'Foo'
Transaction B
UPDATE Table B SET Bar = 'Foo'
UPDATE Table A SET Foo = 'Bar'
This is extremely likely to result in a deadlock as Transaction A gets a lock on Table A, Transaction B gets a lock on table B, therefore neither of them get a lock for their second command until the other has finished.
All other forms of deadlocks are generally caused through high intensity use and SQL Server deadlocking internally whilst allocated resources.
Yes - deadlocks occur when processes try to acquire resources in random order. If all your processes try to acquire the same resources in the same order, the possibilities for deadlocks are greatly reduced, if not eliminated.
Of course, this is not always easy to arrange...
The most common (according to my unscientific observations) DB deadlock scenario is very simple:
Two processes read something (a DB record for example), both acquire a shared lock on the associated resource (usually a DB page),
Both try to make an update, trying to upgrade their locks to exclusive ones - voila, deadlock.
This can be avoided by specifying the "FOR UPDATE" clause (or similar, depending on your particular RDBMS) if the read is to be followed by an update. This way the process gets the exclusive lock from the start, making the above scenario impossible.
I recommend reading this article by Herb Sutter. It explains the reasons behind deadlocking issues and puts forward a framework in this article to tackle this problem.
The typical scenario are mismatched update plans (tables not always updated in the same order). However it is not unusual to have deadlocks when under high processing volume.
I tend to accept deadlocks as a fact of life, it will happen one day or another so I have my DAL prepared to handle and retry a deadlocked operation.
A condition that occure whene two process are each waiting for the othere to complete befoure preceding.the result is both procedure is hang.
its most comonelly multitasking and clint/server.
Deadlock occurs mainly when there are multiple dependent locks exist. In a thread and another thread tries to lock the mutex in reverse order occurs. One should pay attention to use a mutex to avoid deadlocks.
Be sure to complete the operation after releasing the lock. If you have multiple locks, such as access order is ABC, releasing order should also be ABC.
In my last project I faced a problem with deadlocks in an sql Server Database. The problem in finding the reason was, that my software and a third party software are using the same Database and are working on the same tables. It was very hard to find out, what causes the deadlocks. I ended up writing an sql-query to find out which processes an which sql-Statements are causing the deadlocks. You can find that statement here: Deadlocks on SQL-Server
To avoid the deadlock there is a algorithm called Banker's algorithm.
This one also provides helpful information to avoid deadlock.

Resources