How to pass result as it comes using coroutines? - asynchronous

Let's say I have list of repos. I want to iterate through all of them. As each repo returns with result, I wanted to pass it on.
val repos = listOf(repo1, repo2, repo3)
val deferredItems = mutableListOf<Deferred<List<result>>>()
repos.forEach { repo ->
deferredItems.add(async { getResult(repo) })
}
val results = mutableListOf<Any>()
deferredItems.forEach { deferredItem ->
results.add(deferredItem.await())
}
println("results :: $results")
In the above case, It waits for each repo to return result. It fills the results in sequence, result of repo1 followed by result of repo2. If repo1 takes more time than repo2 to return result, we will be waiting for repo1's result even though we have result for repo2.
Is there any way to pass the result of repo2 as soon as we have the result?

The Flow API supports this almost directly:
repos.asFlow()
.flatMapMerge { flow { emit(getResult(it)) } }
.collect { println(it) }
flatMapMerge first collects all the Flows that come out of the lambda you pass to it and then concurrently collects those and sends them into the downstream as soon as any of them completes.

That's what channels are for:
val repos = listOf("repo1", "repo2", "repo3")
val results = Channel<Result>()
repos.forEach { repo ->
launch {
val res = getResult(repo)
results.send(res)
}
}
for (r in results) {
println(r)
}
This example is incomplete, as I don't close the channel, so the resulting code will be forever suspended. Make sure that in your real code you close the channel once all results are received:
val count = AtomicInteger()
for (r in results) {
println(r)
if (count.incrementAndGet() == repos.size) {
results.close()
}
}

you should use Channels.
suspend fun loadReposConcurrent() = coroutineScope {
val repos = listOf(repo1, repo2, repo3)
val channel = Channel<List<YourResultType>>()
for (repo in repos) {
launch {
val result = getResult(repo)
channel.send(result)
}
}
var allResults = emptyList<YourResultType>()
repeat(repos.size) {
val result = channel.receive()
allResults = allResults + result
println("results :: $result")
//updateUi(allResults)
}
}
in the code above in for (repo in repos) {...} loop all the requests calculated in seprate coroutines with launch and as soon as their result is ready will send to channel.
in repeat(repos.size) {...} the channel.receive() waits for new values from all coroutines and consumes them.

Related

Parallel work stealing in arbitrary order in Rust

I'm trying to write a parallel data loader for deep learning in Rust. The task is to write an iterator that under the hood does the following
Reads files from disk and applies some compute-heavy preprocessing to them, the result is generally a numeric array (or multiple)
Groups the results of the previous step into batches of size B and "collates" them - this generally means just concatenating the arrays - moderately compute heavy
Yields the results from step 2.
Step 1 can be both IO and compute bound, depending on network latency, size of files and complexity of preprocessing. It has to be run in parallel by many workers. Step 2 should be off the main thread but likely doesn't need a pool of workers. Step 3 happens on main thread (exposed to Python).
The reason I write it in Rust is that Python offers two options: pure Python implementation shipped with PyTorch, based on multiprocessing, which is somewhat slow but very flexible (arbitrary user-defined data preprocessing and batching) and C++ implementation shipped with Tensorflow, which is assembled by the user from a set of predefined primitives. The latter is substantially faster but too restrictive for the kinds of data processing I wish to do. I expect that Rust will give me the speed of Tensorflow with flexibility of arbitrary code as in PyTorch.
My question is purely about the way to implement parallelism. The ideal setup is to have N workers for step 1) -> channel -> worker for step 2) -> channel -> step 3. Because the iterator object may be dropped at any time, there is a strict requirement to be able to terminate the whole scheme after Drop. On the other hand, there is the flexibility of loading the files in an arbitrary order: for example if the batch size B == 16 and max_n_threads == 32, it is perfectly fine to start 32 workers and yield the first batch containing the 16 examples which happen to return first. This can be exploited for speed.
My naive implementation creates the DataLoader in 3 steps:
Create a n_working: Arc<AtomicUsize> to control the number of worker threads active and should_shutdown: Arc<AtomicBool> to signal shutdown (when Drop is called)
Create a thread responsible for maintaining the pool. It spins on n_working < max_n_threads and keeps spawning worker threads which terminate on should_shutdown, otherwise fetch a single example, send it down the worker->batcher channel and decrement n_working
Create a batching thread which polls the worker->batcher channel, upon receiving B objects concatenates them into a batch and sends down the batcher->yielder channel
#[pyclass]
struct DataLoader {
collate_worker: Option<thread::JoinHandle<()>>,
example_worker: Option<thread::JoinHandle<()>>,
should_shut_down: Arc<AtomicBool>,
receiver: Receiver<Batch>,
length: usize,
}
impl DataLoader {
fn new(
dataset: Dataset,
batch_size: usize,
capacity: usize,
) -> Self {
let n_batches = dataset.len() / batch_size;
let max_n_threads = capacity * batch_size;
let (example_sender, collate_receiver) = bounded((batch_size - 1) * capacity);
let should_shut_down = Arc::new(AtomicBool::new(false));
let shutdown_flag = should_shut_down.clone();
let example_worker = thread::spawn(move || {
rayon::scope_fifo(|s| {
let dataset = &dataset;
let n_working = Arc::new(AtomicUsize::new(0));
let mut current_index = 0;
while current_index < n_batches * batch_size {
if n_working.load(Ordering::Relaxed) == max_n_threads {
continue;
}
if shutdown_flag.load(Ordering::Relaxed) {
break;
}
let index = current_index.clone();
let sender = example_sender.clone();
let counter = n_working.clone();
let shutdown_flag = shutdown_flag.clone();
s.spawn_fifo(move |_s| {
let example = dataset.get_example(index);
if !shutdown_flag.load(Ordering::Relaxed) {
_ = sender.send(example);
} // if we should shut down, skip sending
counter.fetch_sub(1, Ordering::Relaxed);
});
current_index += 1;
n_working.fetch_add(1, Ordering::Relaxed);
};
});
});
let (batch_sender, final_receiver) = bounded(capacity);
let shutdown_flag = should_shut_down.clone();
let collate_worker = thread::spawn(move || {
'outer: loop {
let mut batch = vec![];
for _ in 0..batch_size {
if let Ok(example) = collate_receiver.recv() {
batch.push(example);
} else {
break 'outer;
}
};
let collated = collate(batch);
if shutdown_flag.load(Ordering::Relaxed) {
break; // skip sending
}
_ = batch_sender.send(collated);
};
});
Self {
collate_worker: Some(collate_worker),
example_worker: Some(example_worker),
should_shut_down: should_shut_down,
receiver: final_receiver,
length: n_batches,
}
}
}
#[pymethods]
impl DataLoader {
fn __iter__(slf: PyRef<Self>) -> PyRef<Self> { slf }
fn __next__(&mut self) -> Option<Batch> {
self.receiver.recv().ok()
}
fn __len__(&self) -> usize {
self.length
}
}
impl Drop for DataLoader {
fn drop(&mut self) {
self.should_shut_down.store(true, Ordering::Relaxed);
if self.collate_worker.take().unwrap().join().is_err() {
println!("Panic in collate worker");
};
if self.example_worker.take().unwrap().join().is_err() {
println!("Panic in example_worker");
};
println!("dropped the dataloader");
}
}
This implementation works and roughly matches the performance of PyTorch but provides no significant speedup. I don't know where to look for improvements, but I imagine it would help to have the thing load-balance automatically in a work-stealing way and to flexibly spawn workers depending on the proportion of IO and compute time. I am also expecting performance issues due to the spinning pool manager and likely corner cases in my handling of Drop.
My question is how to best approach the problem. I am generally unsure if this should be tackled with parallel crates like rayon, async crates like tokio, or a mix of both. I also have the hunch my implementation could be much simpler with the correct use of their combinators/higher order APIs. I tried with rayon but I couldn't get a solution which doesn't wastefully enforce the original sequential returning order and respects the Drop requirement.
Okay I think I've figured out a solution for you that uses rayon parallel iterators.
The trick is to use Results in the rayon iterators, and return Err if the cancellation flag is set.
I first created a utility type to create a cancellable thread in which you can execute rayon iterators. You use it by passing in the thread closure which takes the atomic cancellation token as a parameter. Then you have to check if the cancellation token is true, and if so, exit early.
use std::sync::Arc;
use std::sync::atomic::{Ordering, AtomicBool};
use std::thread::JoinHandle;
fn collate(batch: &[Computed]) -> Batch {
batch.iter().map(|&x| i128::from(x)).sum()
}
#[derive(Debug)]
struct Cancelled;
struct CancellableThread<Output: Send + 'static> {
cancel_token: Arc<AtomicBool>,
thread: Option<JoinHandle<Result<Output, Cancelled>>>,
}
impl<Output: Send + 'static> CancellableThread<Output> {
fn new<F: FnOnce(Arc<AtomicBool>) -> Result<Output, Cancelled> + Send + 'static>(init: F) -> Self {
let cancel_token = Arc::new(AtomicBool::new(false));
let thread_cancel_token = Arc::clone(&cancel_token);
CancellableThread {
thread: Some(std::thread::spawn(move || init(thread_cancel_token))),
cancel_token,
}
}
fn output(mut self) -> Output {
self.thread.take().unwrap().join().unwrap().unwrap()
}
}
impl<Output: Send + 'static> Drop for CancellableThread<Output> {
fn drop(&mut self) {
self.cancel_token.store(true, Ordering::Relaxed);
if let Some(thread) = self.thread.take() {
let _ = thread.join().unwrap();
}
}
}
I found it useful to create a closure that returns a Result<(), Cancelled> so I could use the try operator (?) to exit early.
CancellableThread::new(move |cancel_token| {
let cancelled = || if cancel_token.load(Ordering::Relaxed) {
Err(Cancelled)
} else {
Ok(())
};
loop {
// was the thread dropped?
// if so, stop what we're doing
cancelled?;
// do stuff and
// eventually return a result
}
});
I then used that CancellableThread abstraction in the DataLoader. No need to create a special Drop impl for it, because by default, it will call drop on each field anyways, which will handle the cancellation.
type Data = Vec<u8>;
type Dataset = Vec<Data>;
type Computed = u64;
type Batch = i128;
use rayon::prelude::*;
use crossbeam::channel::{unbounded, Receiver};
struct DataLoader {
example_worker: CancellableThread<()>,
collate_worker: CancellableThread<()>,
receiver: Receiver<Batch>,
length: usize,
}
I used unbounded channels, as it was one less thing to bother about. It shouldn't be hard to switch to bounded ones instead.
impl DataLoader {
fn new(dataset: Dataset, batch_size: usize) -> Self {
let (example_sender, collate_receiver) = unbounded();
let (batch_sender, final_receiver) = unbounded();
I'm not sure if you can always guarantee that the number of items in your dataset will be a multiple of the batch_size, so I decided to handle that explicitly.
let length = if dataset.len() % batch_size == 0 {
dataset.len() / batch_size
} else {
dataset.len() / batch_size + 1
};
I created the collating worker first, though that may not be necessary. As you can see, I had to duplicate a little bit to handle partial batches.
let collate_worker = CancellableThread::new(move |cancel_token| {
let cancelled = || if cancel_token.load(Ordering::Relaxed) {
Err(Cancelled)
} else {
Ok(())
};
'outer: loop {
let mut batch = Vec::with_capacity(batch_size);
for _ in 0..batch_size {
cancelled()?;
if let Ok(data) = collate_receiver.recv() {
batch.push(data);
} else {
if !batch.is_empty() {
// handle the last batch, if there
// weren't enough items to fill it
let collated = collate(&batch);
cancelled()?;
batch_sender.send(collated).unwrap();
}
break 'outer;
}
}
let collated = collate(&batch);
cancelled()?;
batch_sender.send(collated).unwrap();
}
Ok(())
});
The example worker is where things are really made much simpler, because we can just use rayon parallel iterators. As you can see, we check for cancellation before each heavy computation.
let example_worker = CancellableThread::new(move |cancel_token| {
let cancelled = || if cancel_token.load(Ordering::Relaxed) {
Err(Cancelled)
} else {
Ok(())
};
let heavy_compute = |data: Data| -> Result<Computed, Cancelled> {
cancelled()?;
Ok(data.iter().map(|&x| u64::from(x)).product())
};
dataset
.into_par_iter()
.map(heavy_compute)
.try_for_each(|computed| {
example_sender.send(computed?).unwrap();
Ok(())
})
});
Then we just construct the DataLoader. You can see the Python impl is identical:
DataLoader {
example_worker,
collate_worker,
receiver: final_receiver,
length,
}
}
}
// #[pymethods]
impl DataLoader {
fn __iter__(this: Self /* PyRef<Self> */) -> Self /* PyRef<Self> */ { this }
fn __next__(&mut self) -> Option<Batch> {
self.receiver.recv().ok()
}
fn __len__(&self) -> usize {
self.length
}
}
playground

Issue with coroutines fold function and Firestore

thank you for taking your time to read my problem.
Im currently using Firebase Firestore to retrieve a list of objects that I which to display to the UI, im trying to use a suspend function to fold the accumulative values of a sequence of calls from the Firestore server, but at the moment im unable to pass the result value outside the scope of the coroutine.
This is my fold function:
suspend fun getFormattedList(): FirestoreState {
return foldFunctions(FirestoreModel(""), ::getMatchesFromBackend, ...., ....)
}
This is my custom fold function:
suspend fun foldFunctions(model: FirestoreModel,
vararg functions: suspend (FirestoreModel, SuccessData) -> FirestoreState): FirestoreState {
val successData: SuccessData = functions.fold(SuccessData()) { updatedSuccessData, function ->
val status = function(model, updatedSuccessData)
if (status !is FirestoreState.Continue) {
return status
}
updatedSuccessData <--- I managed to retrieve the list of values correctly here
}
val successModel = SuccessData()
successData.matchList?.let { successModel.matchList = it }
successData.usermatchList?.let { successModel.usermatchList = it }
successData.formattedList?.let { successModel.formattedList = it }
return FirestoreState.Success(successModel) <--- I cant event get to this line with debugger on
}
This is my first function (which is working fine)
suspend fun getMatchesFromBackend(model: FirestoreModel, successData: SuccessData): FirestoreState {
return try {
val querySnapshot: QuerySnapshot? = db.collection("matches").get().await()
querySnapshot?.toObjects(Match::class.java).let { list ->
val matchList = mutableListOf<Match>()
list?.let {
for (document in it) {
matchList.add(Match(document.away_score,
document.away_team,
document.date,
document.home_score,
document.home_team,
document.match_id,
document.matchpoints,
document.played,
document.round,
document.tournament))
}
successData.matchList = matchList <--- where list gets stored
}
}
FirestoreState.Continue
} catch (e : Exception){
when (e) {
is RuntimeException -> FirestoreState.MatchesFailure
is ConnectException -> FirestoreState.MatchesFailure
is CancellationException -> FirestoreState.MatchesFailure
else -> FirestoreState.MatchesFailure
}
}
}
My hypothesis is that the suspen fun get cancelled and the continuation of the scope gets blocked, I have tried to use runBlocking { } without vail. If someone has an idea of how to circumvent this issue I'd be very gratefull.

kotlin async calls are called in sequence unless called with GobalScope

okay, so I have a controller method which need to make a bunch of soap call to an external service, each one quite heavy. I am trying to do these one in parralel to save some time, but unless I build the async calls from GlobalScope, the deferred are resolved in sequence. Let me show you.
executing the following code
#ResponseBody
#GetMapping(path = ["/buildSoapCall"])
fun searchStations(): String = runBlocking {
var travels: List<Travel> = service.getTravels().take(500)
val deferred = travels
.map {
async() {
print("START")
val result = service.executeSoapCall(it)
print("END")
result
}
}
println("Finished deferred")
val callResults = deferred.awaitAll()
println("Finished Awaiting")
""
}
get me the following console message :
Finished deferred
START-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-END.....
the - is printed by executeSoapCall
As you can see, the deferred are called in sequence.
But if I use GlobalScope, like this :
#ResponseBody
#GetMapping(path = ["/buildSoapCall"])
fun searchStations(): String = runBlocking {
var travels: List<Travel> = service.getTravels().take(500)
val deferred = travels
.map {
GlobalScope.async() {
print("START")
val result = service.executeSoapCall(it)
print("END")
result
}
}
println("Finished deferred")
val callResults = deferred.awaitAll()
println("Finished Awaiting")
""
}
I get the following console message :
Finished Treating
STARTSTARTSTARTSTARTSTARTSTARTSTARTSTARTSTARTSTARTSTARTFinished deferred
START-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART--ENDENDSTARTSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-ENDSTART-END...START-END-END-END-END-END-END-END-END-END-END-END-ENDFinished Awaiting
showing that the Deferred are all starting in parallel. In addition, the treatment time is quite shorter.
I don't really understand why I have this behaviour.
Your call to service.executeSoapCall blocks the thread runBlocking coroutine is running on. You need to start async coroutine on a different thread everytime to get a concurrent behavior. You can achieve that by using a threadpool, e.g., Dispatchers.IO:
...
async(Dispatchers.IO) {
print("START")
val result = service.executeSoapCall(it)
print("END")
result
}
...
or creating a new thread on every call:
...
async(newSingleThreadContext("MyThread")) {
print("START")
val result = service.executeSoapCall(it)
print("END")
result
}
...
GlobalScope works because it uses a ThreadPool by default but you should avoid using it. You can read this article by Roman Elizarov about that topic.

Kotlin async speed test results evaluation

I did some tests which compare speed of using async as a method of deferring results and CompletableDeferred with combination of Job or startCoroutine to do the same job.
In summary there are 3 use cases:
async with default type of start (right away) [async]
CompletableDeferred + launch (basically Job) [cdl]
CompletableDeferred + startCoroutine [ccdl]
results are presented here:
In short every iteration of each use case test generates 10000 of async / cdl / ccdl requests and waits for them to complete. This is repeated 225 times with 25 times as a warmUp (not included in results) and data points are collected over 100 iteration of process above (as min, max, avg).
here is a code:
import com.source.log.log
import kotlinx.coroutines.*
import kotlin.coroutines.Continuation
import kotlin.coroutines.startCoroutine
import kotlin.system.measureNanoTime
import kotlin.system.measureTimeMillis
/**
* #project Bricks
* #author SourceOne on 28.11.2019
*/
/*I know that there are better ways to benchmark speed
* but given the produced results this method is fine enough
* */
fun benchmark(warmUp: Int, repeat: Int, action: suspend () -> Unit): Pair<List<Long>, List<Long>> {
val warmUpResults = List(warmUp) {
measureNanoTime {
runBlocking {
action()
}
}
}
val benchmarkResults = List(repeat) {
measureNanoTime {
runBlocking {
action()
}
}
}
return warmUpResults to benchmarkResults
}
/* find way to cancel startedCoroutine when deferred is
* canceled (currently you have to cancel whole context)
* */
fun <T> CoroutineScope.completable(provider: suspend () -> T): Deferred<T> {
return CompletableDeferred<T>().also { completable ->
provider.startCoroutine(
Continuation(coroutineContext) { result ->
completable.completeWith(result)
}
)
}
}
suspend fun calculateAsyncStep() = coroutineScope {
val list = List(10000) {
async { "i'm a robot" }
}
awaitAll(*list.toTypedArray())
}
suspend fun calculateCDLStep() = coroutineScope {
val list = List(10000) {
CompletableDeferred<String>().also {
launch {
it.complete("i'm a robot")
}
}
}
awaitAll(*list.toTypedArray())
}
suspend fun calculateCCDLStep() = coroutineScope {
val list = List(10000) {
completable { "i'm a robot" }
}
awaitAll(*list.toTypedArray())
}
fun main() {
val labels = listOf("async", "cdl", "ccdl")
val collectedResults = listOf(
mutableListOf<Pair<List<Long>, List<Long>>>(),
mutableListOf(),
mutableListOf()
)
"stabilizing runs".log()
repeat(2) {
println("async $it")
benchmark(warmUp = 25, repeat = 200) {
calculateAsyncStep()
}
println("CDL $it")
benchmark(warmUp = 25, repeat = 200) {
calculateCDLStep()
}
println("CCDL $it")
benchmark(warmUp = 25, repeat = 200) {
calculateCCDLStep()
}
}
"\n#Benchmark start".log()
val benchmarkTime = measureTimeMillis {
repeat(100) {
println("async $it")
collectedResults[0] += benchmark(warmUp = 25, repeat = 200) {
calculateAsyncStep()
}
println("CDL $it")
collectedResults[1] += benchmark(warmUp = 25, repeat = 200) {
calculateCDLStep()
}
println("CCDL $it")
collectedResults[2] += benchmark(warmUp = 25, repeat = 200) {
calculateCCDLStep()
}
}
}
"\n#Benchmark completed in ${benchmarkTime}ms".log()
"#Benchmark results:".log()
val minMaxAvg = collectedResults.map { stageResults ->
stageResults.map { (_, benchmark) ->
arrayOf(
benchmark.minBy { it }!!, benchmark.maxBy { it }!!, benchmark.average().toLong()
)
}
}
minMaxAvg.forEachIndexed { index, list ->
"results for: ${labels[index]} [min, max, avg]".log()
list.forEach { results ->
"${results[0]}\t${results[1]}\t${results[2]}".log()
}
}
}
There is no surprise that the first two use cases (async and cdl) are very close to each other and async is always better (because you don't have the overhead of creating job to complete deferred object) but comparing async vs CompletableDeferred + startCoroutine there is a huge gap between them (almost 2 times) in favor of the last one. Why there is such a big difference and if anyone knows, why shouldn't we just be using CompletableDeferred + startCoroutine wrapper (like completable() here) instead of async?
Addition1:
Here is a sample for 1000 points:
There are constant spikes in async and cdl results and some in ccdl (maybe gc?) but still there is far less with ccdl. I will rerun these tests with changed order of tests interleaving but it seems that it's related to something under the coroutines machinery.
Edit1:
I've accepted Marko Topolnik answer, but in addition to it, you still can use this 'as he called' bare launch method if you await for the result within the scope you have launched it.
In example if you will launch few deffered coroutines (async) and at the end of that scope you will await them all then the ccdl method will work as expected (at least from what i've seen in my tests).
Since launch and async are built as a layer on top of the low-level primitive createCoroutineUnintercepted(), whereas startCoroutine is practically a direct call into it, there aren't any surprises in your benchmark results.
why shouldn't we just be using CompletableDeferred + startCoroutine wrapper (like completable() here) instead of async?
A comment in your code already hints to the answer:
/*
* find way to cancel startedCoroutine when deferred is
* canceled (currently you have to cancel whole context)
*/
The layer you short-circuited with startCoroutine is precisely the layer that handles things as cancellation, coroutine hierarchy, exception handling and propagation, and so on.
Here's a simple example that shows you one of the things that break when you replace launch with a bare coroutine:
fun main() = runBlocking {
bareLaunch {
try {
delay(1000)
println("Coroutine done")
} catch (e: CancellationException) {
println("Coroutine cancelled, the exception is: $e")
}
}
delay(10)
}
fun CoroutineScope.bareLaunch(block: suspend () -> Unit) =
block.startCoroutine(Continuation(coroutineContext) { Unit })
fun <T> CoroutineScope.bareAsync(block: suspend () -> T) =
CompletableDeferred<T>().also { deferred ->
block.startCoroutine(Continuation(coroutineContext) { result ->
result.exceptionOrNull()?.also {
deferred.completeExceptionally(it)
} ?: run {
deferred.complete(result.getOrThrow())
}
})
}
When you run this, you'll see the bare coroutine got cancelled after 10 milliseconds. The runBlocking builder didn't realize it had to wait for it to complete. If you replace bareLaunch { with launch {, you'll restore the designed behavior where the child coroutine completes normally. The same thing happens with bareAsync.

In grails PromiseMap, how to block current thread, for all of the concurrent tasks to complete and return their results?

I need to asynchronously fetch cats, dogs and mice and then do some post-processing. Here is something what I am doing:
Promise<List<Cat>> fetchCats = task {}
Promise<List<Mouse>> fetchMice = task { }
Promise<List<Dog>> fetchDogs = task {}
List promiseList = [fetchCats, fetchMice, fetchDogs]
List results = Promises.waitAll(promiseList)
The problem I am facing is, order of items in list results is not fixed, i.e. in one execution results can be [cats, dogs, mice] and in other execution, results can be [dogs, mice, cats].
Which means to access cats I need to explicitly check type of element of results, and similarly for dogs, and mice which makes my code look bad.
Upon going through documentation here, I found PromiseMap API can help me as it provides a pretty way of accessing results through key-value pairs. Here is what it offers:
import grails.async.*
def map = new PromiseMap()
map['one'] = { 2 * 2 }
map['two'] = { 4 * 4 }
map['three'] = { 8 * 8 }
map.onComplete { Map results ->
assert [one:4,two:16,three:64] == results
}
Though PromiseMap has an onComplete method, but it does not make current thread wait for all the promises to finish.
Using PromiseMap, how can I block the current thread till all the promises get finished?
If you are only concern about current thread to wait until PromiseMap complete, can use Thread : join()
import grails.async.*
def map = new PromiseMap()
map['one'] = { println "task one" }
map['two'] = { println "task two" }
map['three'] = { println "task three" }
Thread t = new Thread() {
public void run() {
println("pausing the current thread, let promiseMap complete first")
map.onComplete { Map results ->
println("Promisemap processing : " + results)
}
}
}
t.start()
t.join()
println("\n CurrentThread : I can won the race if you just comment t.join() line in code")
Use .get()
From the PromiseMap source:
/**
* Synchronously return the populated map with all values obtained from promises used
* inside the populated map
*
* #return A map where the values are obtained from the promises
*/
Map<K, V> get() throws Throwable {

Resources