How to wait all spawned async tasks - asynchronous

I have an async method that uses tokio::fs to explore a directory:
use failure::Error;
use futures::Future;
use std::path::PathBuf;
use tokio::prelude::*;
fn visit_async(path: PathBuf) -> Box<Future<Item = (), Error = Error> + Send> {
let task = tokio::fs::read_dir(path)
.flatten_stream()
.for_each(move |entry| {
let path = entry.path();
if path.is_dir() {
let task = visit_async(entry.path());
tokio::spawn(task.map_err(drop));
} else {
println!("File: {:?}", path);
}
future::ok(())
})
.map_err(Error::from);
Box::new(task)
}
I need to execute another future after all the the future returned by this method ends as well as all the tasks spawned by it. Is there a better way that just starting another runtime?
let t = visit_async(PathBuf::from(".")).map_err(drop);
tokio::run(t);
tokio::run(future::ok(()));

I'd strive to avoid using tokio::spawn() here, and try to wrap it all into a single future (in general, I think you only do tokio::spawn when you don't care about the result or execution, which we do here). That should make it easy to wait for completion. I haven't tested this, but something along these lines might do the trick:
let task = tokio::fs::read_dir(path)
.flatten_stream()
.for_each(move |entry| {
let path = entry.path();
if path.is_dir() {
let task = visit_async(entry.path());
future::Either::A(task)
} else {
println!("File: {:?}", path);
future::Either::B(future::ok(()))
}
})
.map_err(Error::from)
.and_then(|_| {
// Do some work once all tasks complete
});
Box::new(task)
This will cause the asynchronous tasks to execute in sequence. You could use and_then instead of for_each to execute them in parallel and then into_future().and_then(|_| { ... }) to tuck on some action to execute afterwards.

There's another issue with parallel descent in the FS: you may run out of file descriptors.
There is a way to solve both issues by creating a tokio::sync::Semaphore to limit the concurrent number of these tasks. After you are done spawning all of them, you can use Semaphore::acquire_many with the same value you used at creation, to block until all other tasks are finished.
For correctness, you should acquire the semaphore before spawning the task, and then pass the SemaphorePermit to the task (and make sure it doesn't get dropped before you are done). If you acquire the semaphore inside the tasks, there is a risk the main task might acquire all the permits before the first sub-task has a chance to run.
Since you can only move a SemaphorePermit<'static> inside the task, you will need to have a &'static Semaphore, for instance using lazy_static! or Box::leak.

Related

Is there a way to poll several futures simultaniously in rust async

I'm trying to execute several sqlx queries in parallel given by a iterator.
This is probably the closest I've got so far.
let mut futures = HahshMap::new() // placeholder, filled HashMap in reality
.iter()
.map(async move |(_, item)| -> Result<(), sqlx::Error> {
let result = sqlx::query_file_as!(
// omitted
)
.fetch_one(&pool)
.await?;
channel.send(Enum::Event(result)).ignore();
Ok(())
})
.clollect();
futures::future::join_all(futures);
All queries and sends are independent from each other, so if one of them fails, the others should still get processed.
Futthermore the current async closure is not possible like this.
Rust doesn't yet have async closures. You instead need to have the closure return an async block:
move |(_, item)| async move { ... }
Additionally, make sure you .await the future returned by join_all in order to ensure the individual tasks are actually polled.

Why does async-std's task::spawn prevent subsequent lines of code from executing?

I'm trying to use async-std's task::spawn:
use std::time::Duration;
async fn w() {
loop {
task::sleep(Duration::from_secs(1)).await;
println!("Tick");
}
}
#[async_std::main]
async fn main() {
println!("Start");
task::spawn(w()).await;
println!("End");
}
I expect that "End" is immediately printed after "Start", but the "Tick" loop is printed endlessly.
So what is exactly the difference to the following?
#[async_std::main]
async fn main() {
println!("Start");
w().await;
println!("End");
}
Don't .await the spawned task. Doing so is like joining a thread: it waits for the task to finish before continuing. The handle spawn returns does not need to be awaited in order for the task to be driven to completion.
task::spawn(w()).await;
As written, you are correct that task::spawn(w()).await is no different than w().await. They both block the thread waiting for w() to finish.
What's the purpose of awaiting a spawn call, then? It's useful if you want to spawn a background task, do some other work on the current thread, and then block.
let handle = task::spawn(w());
do_other_things_for_a_while();
// Block and retrieve the result.
let result = handle.await;

How do I execute an async/await function without using any external dependencies?

I am attempting to create simplest possible example that can get async fn hello() to eventually print out Hello World!. This should happen without any external dependency like tokio, just plain Rust and std. Bonus points if we can get it done without ever using unsafe.
#![feature(async_await)]
async fn hello() {
println!("Hello, World!");
}
fn main() {
let task = hello();
// Something beautiful happens here, and `Hello, World!` is printed on screen.
}
I know async/await is still a nightly feature, and it is subject to change in the foreseeable future.
I know there is a whole lot of Future implementations, I am aware of the existence of tokio.
I am just trying to educate myself on the inner workings of standard library futures.
My helpless, clumsy endeavours
My vague understanding is that, first off, I need to Pin task down. So I went ahead and
let pinned_task = Pin::new(&mut task);
but
the trait `std::marker::Unpin` is not implemented for `std::future::GenFuture<[static generator#src/main.rs:7:18: 9:2 {}]>`
so I thought, of course, I probably need to Box it, so I'm sure it won't move around in memory. Somewhat surprisingly, I get the same error.
What I could get so far is
let pinned_task = unsafe {
Pin::new_unchecked(&mut task)
};
which is obviously not something I should do. Even so, let's say I got my hands on the Pinned Future. Now I need to poll() it somehow. For that, I need a Waker.
So I tried to look around on how to get my hands on a Waker. On the doc it kinda looks like the only way to get a Waker is with another new_unchecked that accepts a RawWaker. From there I got here and from there here, where I just curled up on the floor and started crying.
This part of the futures stack is not intended to be implemented by many people. The rough estimate that I have seen in that maybe there will be 10 or so actual implementations.
That said, you can fill in the basic aspects of an executor that is extremely limited by following the function signatures needed:
async fn hello() {
println!("Hello, World!");
}
fn main() {
drive_to_completion(hello());
}
use std::{
future::Future,
ptr,
task::{Context, Poll, RawWaker, RawWakerVTable, Waker},
};
fn drive_to_completion<F>(f: F) -> F::Output
where
F: Future,
{
let waker = my_waker();
let mut context = Context::from_waker(&waker);
let mut t = Box::pin(f);
let t = t.as_mut();
loop {
match t.poll(&mut context) {
Poll::Ready(v) => return v,
Poll::Pending => panic!("This executor does not support futures that are not ready"),
}
}
}
type WakerData = *const ();
unsafe fn clone(_: WakerData) -> RawWaker {
my_raw_waker()
}
unsafe fn wake(_: WakerData) {}
unsafe fn wake_by_ref(_: WakerData) {}
unsafe fn drop(_: WakerData) {}
static MY_VTABLE: RawWakerVTable = RawWakerVTable::new(clone, wake, wake_by_ref, drop);
fn my_raw_waker() -> RawWaker {
RawWaker::new(ptr::null(), &MY_VTABLE)
}
fn my_waker() -> Waker {
unsafe { Waker::from_raw(my_raw_waker()) }
}
Starting at Future::poll, we see we need a Pinned future and a Context. Context is created from a Waker which needs a RawWaker. A RawWaker needs a RawWakerVTable. We create all of those pieces in the simplest possible ways:
Since we aren't trying to support NotReady cases, we never need to actually do anything for that case and can instead panic. This also means that the implementations of wake can be no-ops.
Since we aren't trying to be efficient, we don't need to store any data for our waker, so clone and drop can basically be no-ops as well.
The easiest way to pin the future is to Box it, but this isn't the most efficient possibility.
If you wanted to support NotReady, the simplest extension is to have a busy loop, polling forever. A slightly more efficient solution is to have a global variable that indicates that someone has called wake and block on that becoming true.

What is the difference between launch/join and async/await in Kotlin coroutines

In the kotlinx.coroutines library you can start new coroutine using either launch (with join) or async (with await). What is the difference between them?
launch is used to fire and forget coroutine. It is like starting a new thread. If the code inside the launch terminates with exception, then it is treated like uncaught exception in a thread -- usually printed to stderr in backend JVM applications and crashes Android applications. join is used to wait for completion of the launched coroutine and it does not propagate its exception. However, a crashed child coroutine cancels its parent with the corresponding exception, too.
async is used to start a coroutine that computes some result. The result is represented by an instance of Deferred and you must use await on it. An uncaught exception inside the async code is stored inside the resulting Deferred and is not delivered anywhere else, it will get silently dropped unless processed. You MUST NOT forget about the coroutine you’ve started with async.
I find this guide to be useful. I will quote the essential parts.
🦄 Coroutines
Essentially, coroutines are light-weight threads.
So you can think of a coroutine as something that manages thread in a very efficient way.
🐤 launch
fun main(args: Array<String>) {
launch { // launch new coroutine in background and continue
delay(1000L) // non-blocking delay for 1 second (default time unit is ms)
println("World!") // print after delay
}
println("Hello,") // main thread continues while coroutine is delayed
Thread.sleep(2000L) // block main thread for 2 seconds to keep JVM alive
}
So launch starts a coroutine, does something, and returns a token immediately as Job. You can call join on this Job to block until this launch coroutine completes.
fun main(args: Array<String>) = runBlocking<Unit> {
val job = launch { // launch new coroutine and keep a reference to its Job
delay(1000L)
println("World!")
}
println("Hello,")
job.join() // wait until child coroutine completes
}
🦆 async
Conceptually, async is just like launch. It starts a separate coroutine which is a light-weight thread that works concurrently with all the other coroutines. The difference is that launch returns a Job and does not carry any resulting value, while async returns a Deferred -- a light-weight non-blocking future that represents a promise to provide a result later.
So async starts a background thread, does something, and returns a token immediately as Deferred.
fun main(args: Array<String>) = runBlocking<Unit> {
val time = measureTimeMillis {
val one = async { doSomethingUsefulOne() }
val two = async { doSomethingUsefulTwo() }
println("The answer is ${one.await() + two.await()}")
}
println("Completed in $time ms")
}
You can use .await() on a deferred value to get its eventual result, but Deferred is also a Job, so you can cancel it if needed.
So Deferred is actually a Job. Read this for more details.
interface Deferred<out T> : Job (source)
🦋 async is eager by default
There is a laziness option to async using an optional start parameter with a value of CoroutineStart.LAZY. It starts coroutine only when its result is needed by some await or if a start function is invoked.
launch and async are used to start new coroutines. But, they execute them in different manner.
I would like to show very basic example which will help you understand difference very easily
launch
class MainActivity : AppCompatActivity() {
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
setContentView(R.layout.activity_main)
btnCount.setOnClickListener {
pgBar.visibility = View.VISIBLE
CoroutineScope(Dispatchers.Main).launch {
val currentMillis = System.currentTimeMillis()
val retVal1 = downloadTask1()
val retVal2 = downloadTask2()
val retVal3 = downloadTask3()
Toast.makeText(applicationContext, "All tasks downloaded! ${retVal1}, ${retVal2}, ${retVal3} in ${(System.currentTimeMillis() - currentMillis)/1000} seconds", Toast.LENGTH_LONG).show();
pgBar.visibility = View.GONE
}
}
// Task 1 will take 5 seconds to complete download
private suspend fun downloadTask1() : String {
kotlinx.coroutines.delay(5000);
return "Complete";
}
// Task 1 will take 8 seconds to complete download
private suspend fun downloadTask2() : Int {
kotlinx.coroutines.delay(8000);
return 100;
}
// Task 1 will take 5 seconds to complete download
private suspend fun downloadTask3() : Float {
kotlinx.coroutines.delay(5000);
return 4.0f;
}
}
In this example, my code is downloading 3 data on click of btnCount button and showing pgBar progress bar until all download gets completed. There are 3 suspend functions downloadTask1(), downloadTask2() and downloadTask3() which downloads data. To simulate it, I've used delay() in these functions. These functions waits for 5 seconds, 8 seconds and 5 seconds respectively.
As we've used launch for starting these suspend functions, launch will execute them sequentially (one-by-one). This means that, downloadTask2() would start after downloadTask1() gets completed and downloadTask3() would start only after downloadTask2() gets completed.
As in output screenshot Toast, total execution time to complete all 3 downloads would lead to 5 seconds + 8 seconds + 5 seconds = 18 seconds with launch
async
As we saw that launch makes execution sequentially for all 3 tasks. The time to complete all tasks was 18 seconds.
If those tasks are independent and if they do not need other task's computation result, we can make them run concurrently. They would start at same time and run concurrently in background. This can be done with async.
async returns an instance of Deffered<T> type, where T is type of data our suspend function returns. For example,
downloadTask1() would return Deferred<String> as String is return type of function
downloadTask2() would return Deferred<Int> as Int is return type of function
downloadTask3() would return Deferred<Float> as Float is return type of function
We can use the return object from async of type Deferred<T> to get the returned value in T type. That can be done with await() call. Check below code for example
btnCount.setOnClickListener {
pgBar.visibility = View.VISIBLE
CoroutineScope(Dispatchers.Main).launch {
val currentMillis = System.currentTimeMillis()
val retVal1 = async(Dispatchers.IO) { downloadTask1() }
val retVal2 = async(Dispatchers.IO) { downloadTask2() }
val retVal3 = async(Dispatchers.IO) { downloadTask3() }
Toast.makeText(applicationContext, "All tasks downloaded! ${retVal1.await()}, ${retVal2.await()}, ${retVal3.await()} in ${(System.currentTimeMillis() - currentMillis)/1000} seconds", Toast.LENGTH_LONG).show();
pgBar.visibility = View.GONE
}
This way, we've launched all 3 tasks concurrently. So, my total execution time to complete would be only 8 seconds which is time for downloadTask2() as it is largest of all of 3 tasks. You can see this in following screenshot in Toast message
both coroutine builders namely launch and async are basically lambdas with receiver of type CoroutineScope which means their inner block is compiled as a suspend function, hence they both run in an asynchronous mode AND they both will execute their block sequentially.
The difference between launch and async is that they enable two different possibilities. The launch builder returns a Job however the async function will return a Deferred object. You can use launch to execute a block that you don't expect any returned value from it i.e writing to a database or saving a file or processing something basically just called for its side effect. On the other hand async which return a Deferred as I stated previously returns a useful value from the execution of its block, an object that wraps your data, so you can use it for mainly its result but possibly for its side effect as well. N.B: you can strip the deferred and get its value using the function await, which will block the execution of your statements until a value is returned or an exceptions is thrown! You could achieve the same thing with launch by using the function join()
both coroutine builder (launch and async) are cancelable.
anything more?: yep with launch if an exception is thrown within its block, the coroutine is automatically canceled and the exceptions is delivered. On the other hand, if that happens with async the exception is not propagated further and should be caught/handled within the returned Deferred object.
more on coroutines https://kotlinlang.org/docs/tutorials/coroutines/coroutines-basic-jvm.html
https://www.codementor.io/blog/kotlin-coroutines-6n53p8cbn1
Async and Launch, both are used to create coroutines that run in the background. In almost every situation one can use either of them.
tl;dr version:
When you dont care about the task's return value, and just want to run it, you may use Launch. If you need the return type from the task/coroutine you should use async.
Alternate:
However, I feel the above difference/approach is a consequence of thinking in terms of Java/one thread per request model. Coroutines are so inexpensive, that if you want to do something from the return value of some task/coroutine(lets say a service call) you are better off creating a new coroutine from that one. If you want a coroutine to wait for another coroutine to transfer some data, I would recommend using channels and not the return value from Deferred object. Using channels and creating as much number of coroutines as required, is the better way IMO
Detailed answer:
The only difference is in the return type and what functionality it provides.
Launch returns Job while Async returns Deferred. Interestingly enough, Deferred extends Job. Which implies it must be providing additional functionality on top of Job. Deferred is type parameterised over where T is the return type. Thus, Deferred object can return some response from the block of code executed by async method.
p.s. I only wrote this answer because I saw some factually incorrect answers on this question and wanted to clarify the concept for everyone. Also, while working on a pet project myself I faced similar problem because of previous Java background.
launch returns a job
async returns a result (deferred job)
launch with join is used to wait until the job gets finished. It simply suspends the coroutine calling join(), leaving the current thread free to do other work (like executing another coroutine) in the meantime.
async is used to compute some results. It creates a coroutine and returns its future result as an implementation of Deferred. The running coroutine is cancelled when the resulting deferred is cancelled.
Consider an async method that returns a string value. If the async method is used without await it will return a Deferred string but if await is used you will get a string as the result
The key difference between async and launch:
Deferred returns a particular value of type T after your Coroutine finishes executing, whereas Job doesn’t.
launch / async no result
Use when don't need the result,
Don't block the code where is called,
Run in sequential
async for result
When you need to wait for the result and can run in parallel for
efficiency,
Block the code where is called,
Run in concurrent
Alongside the other great answers, for the people familiar with Rx and getting into coroutines, async returns a Deferred which is akin to Single while launch returns a Job that is more akin to Completable. You can .await() to block and get the value of the first one, and .join() to block until the Job is completed.

async programming in dart

I am relating to java as to how to do thread/async. I use new Thread(target).start() where target is Runnable as one way to do threading in java. New concurrent api has alternatives but we know that on specific call new threads are creating and passed in tasks are executed.
Similarly how is async done in Dart ?
I read on send/receivport, completer/future, spawnFunction. To me only spawnFunction is convincing statement that will create new thread. can one explain how completer/future help. i know they take callbacks but is there some implicit logic/rule in javascript/dart that callbacks always be execute in different thread.
below is dart snippet/pseudo code:
void callback() {
print("callback called");
}
costlyQuery(sql, void f()) {
executeSql(sql);
f();
}
costlyQuery("select * from dual", callback);
I hope my costlyQuery signature to take function as 2nd parameter is correct. so now I do not think that f() after executeSql(sql) is going to be async. may be taking above example add completer/future if that can make async to help me understand.
tl;dr: There is no implicit rule that a callback will not block.
Javascript event queue
In Javascript, there are no threads (except WebWorkers, but that's different). This means that if any part of your code blocks, the whole application is blocked. There is nothing magic about callbacks, they are just functions:
function longLoop(cb) {
var i = 1000000;
while (i--) ;
cb();
}
function callback() {
console.log("Hello world");
}
function fun() {
longLoop(callback);
console.log("Called after Hello World is printed");
}
To make something asynchronous, the callback must be put on the event queue, which only happens through some API calls:
user initiated event handlers- clicks, keyboard, mouse
API event handlers- XmlHTTPRequest callbacks, WebWorker communication
timing functions- setTimeout, setInterval
When the event is triggered, the callback is placed on the event queue to be executed when all other callbacks have finished executing. This means that no two lines of your code will ever be executing at the same time.
Futures
I assume you have at least stumbled on this post about futures. If not, it's a good read and comes straight from the horses mouth.
In Javascript, you have to do some work to make sense of asynchronous code. There's even a Javascript library called futures for doing common things, such as asynchronous loops and sequences (full disclosure, I have personally worked with the author of this library).
The notion of a future is a promise made by the function creating the future that the future will be completed at some point down the road. If you are going to make some asynchronous call, create a future, return it, and fulfill the promise when the asynchronous call finishes. From the blog post:
Future<Results> costlyQuery() {
var completer = new Completer();
database.query("SELECT * FROM giant_table", (results) {
// when complete
completer.complete(results);
});
// this returns essentially immediately,
// before query is finished
return completer.future;
}
If database.query is a blocking call, the future will be completed immediately. This example assumes that database.query is a non-blocking call (async), so the future will be fulfilled after this function exits. completer.complete will call whatever function is passed to completer.then() with the arguments specified.
Your example, modified to be idiomatic and asynchronous:
void callback() {
print("callback called");
}
costlyQuery(sql) {
var completer = new Completer();
executeSql(sql, () => completer.complete());
return completer.future;
}
costlyQuery("select * from dual").then(callback);
This is asynchronous and uses futures correctly. You could pass your own callback function into costlyQuery as you have done and call that in the callback to executeSql to achieve the same thing, but that is the Javascript way of doing it, not the Dart way.
Isolates
Isolates are similar to Java's threads, except that isolates are a concurrency model and threads are a parallelism model (see this SO question for more information about concurrency vs parallelism).
Dart is special in that the specification does not mandate that everything be run in a single thread. It only mandates that different executing contexts do not have access to the same data. Each isolate has it's own memory and can only communicate (e.g. send data) with other isolates through send/receive ports.
These ports work similarly to channels in Go if you're familiar.
An isolate is an isolated execution context that runs independently of other code. In Javascript terms, this means that it has it's own event queue. See the Dart tour for a more complete introduction to isolates.
Lets say your executeSql is a blocking function. If we don't want to wait for it to finish, we could load it into an isolate:
void callback() {
print("callback called");
}
void costlyQuery() {
port.receive((sql, reply) {
executeSql(sql);
reply.send();
});
}
main() {
var sendPort = spawnFunction(costlyQuery);
// .call does all the magic needed to communicate both directions
sendPort.call("select * from dual").then(callback);
print("Called before executeSql finishes");
}
This code creates an isolate, sends data to it then registers a callback for when it's done. Even if executeSql blocks, main() will not necessarily block.

Resources