How to kill a thread, stop a promise execution in Raku - asynchronous

I am looking for a wait to stop (send an exception) to a running promise on SIGINT. The examples given in the doc exit the whole process and not just one worker.
Does someone know how to "kill", "unschedule", "stop" a running thread ?
This is for a p6-jupyter-kernel issue or this REPL issue.
Current solution is restarting the repl but not killing the blocked thread
await Promise.anyof(
start {
ENTER $running = True;
LEAVE $running = False;
CATCH {
say $_;
reset;
}
$output :=
self.repl-eval($code,:outer_ctx($!save_ctx),|%adverbs);
},
$ctrl-c
);

Short version: don't use threads for this, use processes. Killing the running process probably is the best thing that can be achieved in this situation in general.
Long answer: first, it's helpful to clear up a little confusion in the question.
First of all, there's no such thing as a "running Promise"; a Promise is a data structure for conveying a result of an asynchronous operation. A start block is really doing three things:
Creating a Promise (which it evaluates to)
Scheduling some code to run
Arranging that the outcome of running that code is reflected by keeping or breaking the Promise
That may sound a little academic, but really matters: a Promise has no awareness of what will ultimately end up keeping or breaking it.
Second, a start block is not - at least with the built-in scheduler - backed by a thread, but rather runs on the thread pool. Even if you could figure out a way to "take out" the thread, the thread pool scheduler is not going to be happy with having one of the threads it expects to eat from the work queue on disappear. You could write your own scheduler that really does back work with a fresh thread each time, but that still isn't a complete solution: what if the piece of code the user has requested execution of schedules work of its own, and then awaits that? Then there is no one thread to kill to really bring things to a halt.
Let's assume, however, that we did manage to solve all of this, and we get ourselves a list of one or more threads that we really want to kill without their cooperation (cooperative situations are fairly easy; we use a Promise and have code poll that every so often and die if that cancellation Promise is ever kept/broken).
Any such mechanism that wants to be able to stop a thread blocked on anything (not just compute, but also I/O, locking, etc.) would need deep integration and cooperation from the underlying runtime (such as MoarVM). For example, trying to cancel a thread that is currently performing garbage collection will be a disaster (most likely deadlocking the VM as a whole). Other unfortunate cancellation times could lead to memory corruption if it was half way through an operation that is not safe to interrupt, deadlocks elsewhere if the killed thread was holding locks, and so forth. Thus one would need some kind of safe-pointing mechanism. (We already have things along those lines in MoarVM to know when it's safe to GC, however cancellation implies different demands. It probably cross-cuts numerous parts of the VM codebase.)
And that's not all: the same situation repeats at the Raku language level too. Lock::Async, for example, is not a kind of lock that the underlying runtime is aware of. Probably the best one can do is try to tear down the callstack and run all of the LEAVE phasers; that way there's some hope (if folks used the .protect method; if they just called lock and unlock explicitly, we're done for). But even if we manage not to leak resources (already a big ask), we still don't know - in general - if the code we killed has left the world in any kind of consistent state. In a REPL context this could lead to dubious outcomes in follow-up executions that access the same global state. That's probably annoying, but what really frightens me is folks using such a cancellation mechanism in a production system - which they will if we implement it.
So, effectively, implementing such a feature would entail doing a significant amount of difficult work on the runtime and Rakudo itself, and the result would be a huge footgun (I've not even enumerated all the things that could go wrong, just the first few that came to mind). By contrast, killing a process clears up all resources, and a process has its own memory space, so there's no consistency worries either.

There is currently no way to stop a thread if it doesn't want to be stopped.
A thread can check a flag every so often, and decide to call it quits if that flag is set. It would be very nice if we would have a way to throw an exception inside a thread from another thread. But we do not, at least not as far as I know.

Related

Are the asyncio and node.js event loops truly asynchronous?

I'm assuming that the definition of asynchronous is as follows.
Let's start with a relationship between two 'things': X and Y.
They can be anything, e.g. X can be you and Y can be your washing machine.
Let's say X requests something of Y.
This can also be anything: a question, a task.
Let's say we live in a world where Y cannot immediately respond with the answer / completion status.
What happens?
In a Synchronous relationship, you 'wait around' in some way.
This could involve just sitting there or asking repeatedly.
In an Asynchronous relationship, you go on with your life.
Y will ping you when it's done.
From the perspective of a user's API, node.js and asyncio seem asynchronous. For example, in node.js you can register callbacks upon completion of certain events. And in asyncio, the callback logic goes right after some await my_io().
But here's my question - are node.js and asyncio actually truly asynchronous? Implementation-wise, do they just engage in a bunch of frantic non-blocking "hey, is this file descriptor free yet?" calls or is it actually interrupt-driven?
Yes, they're truly asynchronous by your definition, you (i.e. the Node engine/Python interpreter) go on doing other work while waiting for the task to complete.
How it's implemented doesn't matter to you--you trust that the designers made the right design decisions on your behalf. If there is frantic "hey is this file descriptor free yet?" going on (that's called "polling" or "spin waiting"), that's an implementation detail handled by the engine which has dispatched a thread to wait.
Incidentally, busy waiting is sometimes an efficient way to wait for a resource in certain circumstances (generally, when you expect it to be available very soon) as an alternative to an interrupt or notification.
Think of it this way: if you're waiting for your washing machine to finish a load and you want to be notified precisely (or as precisely as possible) when it finishes as you go about other tasks, you could call your friend to watch the laundry for you. The contract is that your friend notifies you as soon as possible when the laundry is done, but you don't care how they do it. Maybe they stand next to the machine and check if it's done constantly (a good idea if the load is almost complete), or maybe they're clever and have devised a system so they can do other work without constantly checking. Either way, it doesn't impact your ability to do other tasks.
The "friend" is like a thread dispatched by the Node engine and the "laundry" might be a file, HTTP response, etc. From your perspective, it's truly asynchronous--the work spent to fulfill the resource request is being done by a thread spawned by the runtime, the OS or network and runs in parallel to your process.
See this answer for diagrams showing the distinction between asynchrony and synchrony.

Why is async programming faster

I keep hearing that using async programming patterns will make my code run faster. Why is that true? Doesn't the same exact code have to run either way, whether it runs now or it runs later?
It's not faster, it just doesn't waste time.
Synchronous code stops processing when waiting for I/O. Which means that when you're reading a file you can't run any other code. Now, if you have nothing else to do while that file is being read then asynchronous code would not buy you anything much.
Usually the additional CPU time that you can use is useful for servers. So the question is why do asynchronous programming instead of starting up a new thread for each client?
It turns out that starting and tearing down threads is expensive. Some time back in the early 2000s a web server benchmark found that tclhttpd compared favorably to Apache for serving static image files. This is despite the fact that tclhttpd was written in tcl and Apache was written in C and tcl was known to be 50 times slower than C. Tcl managed to hold its own against Apache because tcl had an easy to use asynchronous I/O API. So tclhttpd used it.
It's not that C doesn't have asynchronous I/O API. It's just that they're rarely used. So Apache didn't use it. These days, Apache2 uses asynchronous I/O internally along with thread pools. The C code ends up looking more complicated but it's faster - lesson learned.
Which leads us to the recent obsession with asynchronous programming. Why are people obsessed with it? (most answers on Stackoverflow about javascript programming for example insist that you should never use synchronous versions of asynchronous functions).
This goes back to how you rarely see asynchronous programs in C even though it's the superior way of doing things (GUI code is an exception because UI libraries learned early on to rely on asynchronous programming and Events). There are simply too many functions in C that are synchronous. So even if you wanted to do asynchronous programming you'll end up calling a synchronous function sooner or later. The alternative is to abandon stdlib and write your own asynchronous libraries for everything - from file I/O to networking to SQL.
So, in languages like javascript where asynchronous programming ended up as the default style there is pressure from other programmers to not mess it up and accidentally introduce synchronous functions which would be hard to integrate with asynchronous code without losing a lot of performance. So in the end, like taxes, asynchronous code has become a social contract.
It's not always faster. In fact, just setting up and tearing down the async environment adds a lot of time to your code. You have to spin off a new process/thread, set up an event queue/message pump, and clean up everything nicely in the end. (Even if your framework hides all these details from you, they're happening in the background).
The advantage is blocking. Lot's of our code depends on external resources. We need to query a database for the records to process, or download the latest version of something from a website. From the moment you ask that resource for information until you get an answer, your code has nothing to do. It's blocking, waiting for an answer. All the time your program spends blocking is totally wasted.
That's what async is designed for. By spinning the "wait for this blocking operation" code off into an async request, you let the rest of your non-blocking code keep running.
As a metaphor, imagine a manager telling his employee what to do that day. One the tasks is a phone call to a company with long wait times. If he told her to make the call synchronously she would call and wait on hold without doing anything else. Make it async and she can work on a lot of other tasks while the phone sits on hold in the background.
It runs the same code , but it does not wait for time taking task to finish . It will continue to execute code until async function is done.

what are threads in actionscript functions?

I've seen a lot of other developers refer to threads in ActionScript functions. As a newbie I have no idea what they are referring to so:
What is a thread in this sense?
How would I run more than one thread at a time?
How do I ensure that I am only running one thread at a time?
Thanks
~mike
Threads represent a way to have a program appear to perform several jobs concurrently. Although whether or not the jobs can actually occur simultaneously is dependent on several factors (most importantly, whether the CPU the program is running on has multiple cores available to do the work). Threads are useful because they allow work to be done in one context without interfering with another context.
An example will help to illustrate why this is important. Suppose that you have a program which fetches the list of everyone in the phone book whose name matches some string. When people click the "search" button, it will trigger a costly and time-consuming search, which might not complete for a few seconds.
If you have only a single-threaded execution model, the UI will hang and be unresponsive until the search completes. Your program has no choice but to wait for the results to finish.
But if you have several threads, you can offload the search operation to a different thread, and then have a callback -- a trigger which is invoked when the work is completed -- to let you know that things are ready. This frees up the UI and allows it to continue to respond to events.
Unfortunately, because ActionScript's execution model doesn't support threads natively, it's not possible to get true threading. There is a rough approximation called "green threads", which are threads that are controlled by an execution context or virtual machine rather than a larger operating system, which is how it's usually done. Several people have taken a stab at it, although I can't say how widespread their usage is. You can read more at Alex Harui's blog here and see an example of green threads for ActionScript here.
It really depends on what you mean. The execution model for ActionScript is single-threaded, meaning it can not run a process in the background.
If you are not familiar with threading, it is essentially the ability to have something executed in the background of a main process.
So, if you needed to do a huge mathematical computation in your flex/flash project, with a multi-threaded program you could do that in the background while you simultaneously updated your UI. Because ActionScript is not multi-threaded you can not do such things. However, you can create a pseudo-threading class as demonstrated here:
http://blogs.adobe.com/aharui/pseudothread/PseudoThread.as
The others have described what threading is, and you'd need threading if you were getting hardcore into C++ and 3D game engines, among many other computationally-expensive operations, and languages that support multi-threading.
Actionscript doesn't have multi-threading. It executes all code in one frame. So if you create a for loop that processes 100,000,000 items, it will cause the app to freeze. That's because the Flash Player can only execute one thread of code at a time, per frame.
You can achieve pseudo-threading by using:
Timers
Event.ENTER_FRAME
Those allow you to jump around and execute code.
Tween engines like TweenMax can operate on 1000's of objects at once over a few seconds by using Timers. You can also do this with Event.ENTER_FRAME. There is something called "chunking" (check out Grant Skinner's AS3 Optimizations Presentation), which says "execute computationally expensive tasks over a few frames", like drawing complex bitmaps, which is a pseudo-multi-threading thing you can do with actionscript.
A lot of other things are asynchronous, like service calls. If you make an HTTPService request in Flex, it will send a request to the server and then continue executing code in that frame. Once it's done, the server can still be processing that request (say it's saving a 30mb video to a database on the server), and that might take a minute. Then it will send something back to Flex and you can continue code execution with a ResultEvent.RESULT event handler.
So Actionscript basically uses:
Asynchronous events, and
Timers...
... to achieve pseudo-multi-threading.
a thread allows you to execute two or more blocks of actionscrpt simultaniously by default you will always be executing on the same default thread unless you explcitly start a new thread.

How commonly do deadlock issues occur in programming?

I've programmed in a number of languages, but I am not aware of deadlocks in my code.
I took this to mean it doesn't happen.
Does this happen frequently (in programming, not in the databases) enough that I should be concerned about it?
Deadlocks could arise if two conditions are true: you have mutilple theads, and they contend for more than one resource.
Do you write multi-threaded code? You might do this explicitly by starting your own threads, or you might work in a framework where the threads are created out of your sight, and so you're running in more than one thread without you seeing that in your code.
An example: the Java Servlet API. You write a servlet or JSP. You deploy to the app server. Several users hit your web site, and hence your servlet. The server will likely have a thread per user.
Now consider what happens if in servicing the requests you want to aquire some resources:
if ( user Is Important ){
getResourceA();
}
getResourceB();
if (today is Thursday ) {
getResourceA();
}
// some more code
releaseResourceA();
releaseResoruceB();
In the contrived example above, think about what might happen on a Thursday when an important user's request arrives, and more or less simultaneously an unimportant user's request arrives.
The important user's thread gets Resoruce A and wants B. The less important user gets resource B and wants A. Neither will let go of the resource that they already own ... deadlock.
This can actually happen quite easily if you are writing code that explicitly uses synchronization. Most commonly I see it happen when using databases, and fortunately databases usually have deadlock detection so we can find out what error we made.
Defense against deadlock:
Acquire resources in a well defined order. In the aboce example, if resource A was always obtained before resource B no deadlock would occur.
If possible use timeouts, so that you don't wait indefinately for a resource. This will allow you to detect contention and apply defense 1.
It would be very hard to give an idea of how often it happens in reality (in production code? in development?) and that wouldn't really give a good idea of how much code is vulnerable to it anyway. (Quite often a deadlock will only occur in very specific situations.)
I've seen a few occurrences, although the most recent one I saw was in an Oracle driver (not in the database at all) due to a finalizer running at the same time as another thread trying to grab a connection. Fortunately I found another bug which let me avoid the finalizer running in the first place...
Basically deadlock is almost always due to trying to acquire one lock (B) whilst holding another one (A) while another thread does exactly the same thing the other way round. If one thread is waiting for B to be released, and the thread holding B is waiting for A to be released, neither is willing to let the other proceed.
Make sure you always acquire locks in the same order (and release them in the reverse order) and you should be able to avoid deadlock in most cases.
There are some odd cases where you don't directly have two locks, but it's the same basic principle. For example, in .NET you might use Control.Invoke from a worker thread in order to update the UI on the UI thread. Now Invoke waits until the update has been processed before continuing. Suppose your background thread holds a lock with the update requires... again, the worker thread is waiting for the UI thread, but the UI thread can't proceed because the worker thread holds the lock. Deadlock again.
This is the sort of pattern to watch out for. If you make sure you only lock where you need to, lock for as short a period as you can get away with, and document the thread safety and locking policies of all your code, you should be able to avoid deadlock. Like all threading topics, however, it's easier said than done.
If you get a chance take a look at first few chapters in Java Concurrency in Practice.
Deadlocks can occur in any concurrent programming situation, so it depends how much concurrency you deal with. Several examples of concurrent programming are: multi-process, multi-thread, and libraries introducing multi-thread. UI frameworks, event handling (such as timer event) could be implemented as threads. Web frameworks could spawn threads to handle multiple web requests simultaneously. With multicore CPUs you might see more concurrent situations visibly than before.
If A is waiting for B, and B is waiting for A, the circular wait causes the deadlock. So, it also depends on the type of code you write as well. If you use distributed transactions, you can easily cause that type of scenario. Without distributed transactions, you risk bank accounts from stealing money.
All depends on what you are coding. Traditional single threaded applications that do not use locking. Not really.
Multi-threaded code with multiple locks is what will cause deadlocks.
I just finished refactoring code that used seven different locks without proper exception handling. That had numerous deadlock issues.
A common cause of deadlocks is when you have different threads (or processes) acquire a set of resources in different order.
E.g. if you have some resource A and B, if thread 1 acquires A and then B, and thread 2 acquires B and then A, then this is a deadlock waiting to happen.
There's a simple solution to this problem: have all your threads always acquire resources in the same order. E.g. if all your threads acquire A and B in that order, you will avoid deadlock.
A deadlock is a situation with two processes are dependent on each other - one cannot finish before the other. Therefore, you will likely only have a deadlock in your code if you are running multiple code flows at any one time.
Developing a multi-threaded application means you need to consider deadlocks. A single-threaded application is unlikely to have deadlocks - but not impossible, the obvious example being that you may be using a DB which is subject to deadlocking.

What are common reasons for deadlocks?

Deadlocks are hard to find and very uncomfortable to remove.
How can I find error sources for deadlocks in my code? Are there any "deadlock patterns"?
In my special case, it deals with databases, but this question is open for every deadlock.
Update: This recent MSDN article, Tools And Techniques to Identify Concurrency Issues, might also be of interest
Stephen Toub in the MSDN article Deadlock monitor states the following four conditions necessary for deadlocks to occur:
A limited number of a particular resource. In the case of a monitor in C# (what you use when you employ the lock keyword), this limited number is one, since a monitor is a mutual-exclusion lock (meaning only one thread can own a monitor at a time).
The ability to hold one resource and request another. In C#, this is akin to locking on one object and then locking on another before releasing the first lock, for example:
lock(a)
{
...
lock(b)
{
...
}
}
No preemption capability. In C#, this means that one thread can't force another thread to release a lock.
A circular wait condition. This means that there is a cycle of threads, each of which is waiting for the next to release a resource before it can continue.
He goes on to explain that the way to avoid deadlocks is to avoid (or thwart) condition four.
Joe Duffy discusses several techniques
for avoiding and detecting deadlocks,
including one known as lock leveling.
In lock leveling, locks are assigned
numerical values, and threads must
only acquire locks that have higher
numbers than locks they have already
acquired. This prevents the
possibility of a cycle. It's also
frequently difficult to do well in a
typical software application today,
and a failure to follow lock leveling
on every lock acquisition invites
deadlock.
The classic deadlock scenario is A is holding lock X and wants to acquire lock Y, while B is holding lock Y and wants to acquire lock X. Since neither can complete what they are trying to do both will end up waiting forever (unless timeouts are used).
In this case a deadlock can be avoided if A and B acquire the locks in the same order.
No deadlock patterns to my knowledge (and 12 years of writing heavily multithreaded trading applications).. But the TimedLock class has been of great help in finding deadlocks that exist in code without massive rework.
http://www.randomtree.org/eric/techblog/archives/2004/10/multithreading_is_hard.html
basically, (in dotnet/c#) you search/replace all your "lock(xxx)" statements with "using TimedLock.Lock(xxx)"
If a deadlock is ever detected (lock unable to be obtained within the specified timeout, defaults to 10 seconds), then an exception is thrown. My local version also immediately logs the stacktrace. Walk up the stacktrace (preferably debug build with line numbers) and you'll immediately see what locks were held at the point of failure, and which one it was attempting to get.
In dotnet 1.1, in a deadlock situation as described, as luck would have it all the threads which were locked would throw the exception at the same time. So you'd get 2+ stacktraces, and all the information necessary to fix the problem. (2.0+ may have changed the threading model internally enough to not be this lucky, I'm not sure)
Making sure all transactions affect tables in the same order is the key to avoiding the most common of deadlocks.
For example:
Transaction A
UPDATE Table A SET Foo = 'Bar'
UPDATE Table B SET Bar = 'Foo'
Transaction B
UPDATE Table B SET Bar = 'Foo'
UPDATE Table A SET Foo = 'Bar'
This is extremely likely to result in a deadlock as Transaction A gets a lock on Table A, Transaction B gets a lock on table B, therefore neither of them get a lock for their second command until the other has finished.
All other forms of deadlocks are generally caused through high intensity use and SQL Server deadlocking internally whilst allocated resources.
Yes - deadlocks occur when processes try to acquire resources in random order. If all your processes try to acquire the same resources in the same order, the possibilities for deadlocks are greatly reduced, if not eliminated.
Of course, this is not always easy to arrange...
The most common (according to my unscientific observations) DB deadlock scenario is very simple:
Two processes read something (a DB record for example), both acquire a shared lock on the associated resource (usually a DB page),
Both try to make an update, trying to upgrade their locks to exclusive ones - voila, deadlock.
This can be avoided by specifying the "FOR UPDATE" clause (or similar, depending on your particular RDBMS) if the read is to be followed by an update. This way the process gets the exclusive lock from the start, making the above scenario impossible.
I recommend reading this article by Herb Sutter. It explains the reasons behind deadlocking issues and puts forward a framework in this article to tackle this problem.
The typical scenario are mismatched update plans (tables not always updated in the same order). However it is not unusual to have deadlocks when under high processing volume.
I tend to accept deadlocks as a fact of life, it will happen one day or another so I have my DAL prepared to handle and retry a deadlocked operation.
A condition that occure whene two process are each waiting for the othere to complete befoure preceding.the result is both procedure is hang.
its most comonelly multitasking and clint/server.
Deadlock occurs mainly when there are multiple dependent locks exist. In a thread and another thread tries to lock the mutex in reverse order occurs. One should pay attention to use a mutex to avoid deadlocks.
Be sure to complete the operation after releasing the lock. If you have multiple locks, such as access order is ABC, releasing order should also be ABC.
In my last project I faced a problem with deadlocks in an sql Server Database. The problem in finding the reason was, that my software and a third party software are using the same Database and are working on the same tables. It was very hard to find out, what causes the deadlocks. I ended up writing an sql-query to find out which processes an which sql-Statements are causing the deadlocks. You can find that statement here: Deadlocks on SQL-Server
To avoid the deadlock there is a algorithm called Banker's algorithm.
This one also provides helpful information to avoid deadlock.

Resources