How can I join all the futures in a vector without cancelling on failure like join_all does? - asynchronous

I've got a Vec of futures created from calling an async function. After adding all the futures to the vector, I'd like to wait on the whole set, getting either a list of results, or a callback for each one that finishes.
I could simply loop or iterate over the vector of futures and call .await on each future and that would allow me to handle errors correctly and not have futures::future::join_all cancel the others, but I'm sure there's a more idiomatic way to accomplish this task.
I also want to be able to handle the futures as they complete so if I get enough information from the first few, I can cancel the remaining incomplete futures and not wait for them and discard their results, error or not. This would not be possible if I iterated over the vector in order.
What I'm looking for is a callback (closure, etc) that lets me accumulate the results as they come in so that I can handle errors appropriately or cancel the remainder of the futures (from within the callback) if I determine I don't need the rest of them.
I can tell that's asking for a headache from the borrow checker: trying to modify a future in a Vec in a callback from the async engine.
There's a number of Stack Overflow questions and Reddit posts that explain how join_all joins on a list of futures but cancels the rest if one fails, and how async engines may spawn threads or they may not or they're bad design if they do.

Use futures::select_all:
use futures::future; // 0.3.4
type Error = Box<dyn std::error::Error + Send + Sync + 'static>;
type Result<T, E = Error> = std::result::Result<T, E>;
async fn might_fail(fails: bool) -> Result<i32> {
if fails {
Ok(42)
} else {
Err("boom".into())
}
}
async fn many() -> Result<i32> {
let raw_futs = vec![might_fail(true), might_fail(false), might_fail(true)];
let unpin_futs: Vec<_> = raw_futs.into_iter().map(Box::pin).collect();
let mut futs = unpin_futs;
let mut sum = 0;
while !futs.is_empty() {
match future::select_all(futs).await {
(Ok(val), _index, remaining) => {
sum += val;
futs = remaining;
}
(Err(_e), _index, remaining) => {
// Ignoring all errors
futs = remaining;
}
}
if sum > 42 {
// Early exit
return Ok(sum);
}
}
Ok(sum)
}
This will poll all the futures in the collection, returning the first one that is not pending, its index, and the remaining pending or unpolled futures. You can then match on the Result and handle the success or failure case.
Call select_all inside of a loop. This gives you the ability to exit early from the function. When you exit, the futs vector is dropped and dropping a future cancels it.

As mentioned in comments by #kmdreko, #John Kugelman and #e-net4-the-comment-flagger, use join_all from the futures crate version 0.3. The join_all in futures version 0.2 has the cancel behavior described in the question, but join_all in futures crate version 0.3 does not.

For a quick/hacky solution without having to mess with select_all. You can use join_all, but instead of returning Result<T> return Result<Result<T>> - always returning Ok for the top-level result

Or you can use join_all
The example from the documentation:
use futures::future::join_all;
async fn foo(i: u32) -> u32 { i }
let futures = vec![foo(1), foo(2), foo(3)];
assert_eq!(join_all(futures).await, [1, 2, 3]);

Related

Why is returning a cloned ndarray throwing an overflow error(exceeded max recursion limit)?

I'm currently trying to write a function that is generally equivalent to numpy's tile. Currently each time I try to return a (altered or unaltered) clone of the input array, I get a warning about an overflow, and cargo prompts me to increase the recursion limit. however this function isn't recursive, so I'm assuming its happening somewhere in the implementation.
here is the stripped down function, (full version):
pub fn tile<A, D1, D2>(arr: &Array<A, D1>, reps: Vec<usize>) -> Array<A, D2>
where
A: Clone,
D1: Dimension,
D2: Dimension,
{
let num_of_reps = reps.len();
//just clone the array if reps is all ones
let mut res = arr.clone();
let bail_flag = true;
for &x in reps.iter() {
if x != 1 {
bail_flag = false;
}
}
if bail_flag {
let mut res_dim = res.shape().to_owned();
_new_shape(num_of_reps, res.ndim(), &mut res_dim);
res.to_shape(res_dim);
return res;
}
...
//otherwise do extra work
...
return res.reshape(shape_out);
}
this is the actual error I'm getting on returning res:
overflow evaluating the requirement `&ArrayBase<_, _>: Neg`
consider increasing the recursion limit by adding a `#![recursion_limit = "1024"]` attribute to your crate (`mfcc_2`)
required because of the requirements on the impl of `Neg` for `&ArrayBase<_, _>`
511 redundant requirements hidden
required because of the requirements on the impl of `Neg` for `&ArrayBase<OwnedRepr<A>, D1>`rustcE0275
I looked at the implementation of Neg in ndarray, it doesn't seem to be recursive, so I'm a little confused as to what is going on.
p.s. I'm aware there are other errors in this code, as those appeared after I switched from A to f64(the actual type I plan on using the function with), but those are mostly trivial to fix. Still if you have suggestions on any error you see I appreciate them nonetheless.

Techniques for turning recursive functions into iterators in Rust?

I'm struggling to turn a simple recursive function into a simple iterator. The problem is that the recursive function maintains state in its local variables and call stack -- and to turn this into a rust iterator means basically externalizing all the function state into mutable properties on some custom iterator struct. It's quite a messy endeavor.
In a language like javascript or python, yield comes to the rescue. Are there any techniques in Rust to help manage this complexity?
Simple example using yield (pseudocode):
function one_level(state, depth, max_depth) {
if depth == max_depth {
return
}
for s in next_states_from(state) {
yield state_to_value(s);
yield one_level(s, depth+1, max_depth);
}
}
To make something similar work in Rust, I'm basically creating a Vec<Vec<State>> on my iterator struct, to reflect the data returned by next_states_from at each level of the call stack. Then for each next() invocation, carefully popping pieces off of this to restore state. I feel like I may be missing something.
You are performing a (depth-limited) depth-first search on your state graph. You can do it iteratively by using a single stack of unprocessed subtrees(depending on your state graph structure).
struct Iter {
stack: Vec<(State, u32)>,
max_depth: u32,
}
impl Iter {
fn new(root: State, max_depth: u32) -> Self {
Self {
stack: vec![(root, 0)],
max_depth
}
}
}
impl Iterator for Iter {
type Item = u32; // return type of state_to_value
fn next(&mut self) -> Option<Self::Item> {
let (state, depth) = self.stack.pop()?;
if depth < self.max_depth {
for s in next_states_from(state) {
self.stack.push((s, depth+1));
}
}
return Some(state_to_value(state));
}
}
There are some slight differences to your code:
The iterator yields the value of the root element, while your version does not. This can be easily fixed using .skip(1)
Children are processed in right-to-left order (reversed from the result of next_states_from). Otherwise, you will need to reverse the order of pushing the next states (depending on the result type of next_states_from you can just use .rev(), otherwise you will need a temporary)

Is there a better functional way to process a vector with error checking?

I'm learning Rust and would like to know how I can improve the code below.
I have a vector of tuples of form (u32, String). The u32 values represent line numbers and the Strings are the text on the corresponding lines. As long as all the String values can be successfully parsed as integers, I want to return an Ok<Vec<i32>> containing the just parsed String values, but if not I want to return an error of some form (just an Err<String> in the example below).
I'm trying to learn to avoid mutability and use functional styles where appropriate, and the above is straightforward to do functionally if that was all that was needed. Here's what I came up with in this case:
fn data_vals(sv: &Vec<(u32, String)>) -> Result<Vec<i32>, String> {
sv.iter()
.map(|s| s.1.parse::<i32>()
.map_err(|_e| "*** Invalid data.".to_string()))
.collect()
}
However, the small catch is that I want to print an error message for every invalid value (and not just the first one), and the error messages should contain both the line number and the string values in the offending tuple.
I've managed to do it with the following code:
fn data_vals(sv: &Vec<(u32, String)>) -> Result<Vec<i32>, String> {
sv.iter()
.map(|s| (s.0, s.1.parse::<i32>()
.or_else(|e| {
eprintln!("ERROR: Invalid data value at line {}: '{}'",
s.0, s.1);
Err(e)
})))
.collect::<Vec<(u32, Result<i32, _>)>>() // Collect here to avoid short-circuit
.iter()
.map(|i| i.1
.clone()
.map_err(|_e| "*** Invalid data.".to_string()))
.collect()
}
This works, but seems rather messy and cumbersome - especially the typed collect() in the middle to avoid short-circuiting so all the errors are printed. The clone() call is also annoying, and I'm not really sure why it's needed - the compiler says I'm moving out of borrowed content otherwise, but I'm not really sure what's being moved. Is there a way it can be done more cleanly? Or should I go back to a more procedural style? When I tried, I ended up with mutable variables and a flag to indicate success and failure, which seems less elegant:
fn data_vals(sv: &Vec<(u32, String)>) -> Result<Vec<i32>, String> {
let mut datavals = Vec::new();
let mut success = true;
for s in sv {
match s.1.parse::<i32>() {
Ok(v) => datavals.push(v),
Err(_e) => {
eprintln!("ERROR: Invalid data value at line {}: '{}'",
s.0, s.1);
success = false;
},
}
}
if success {
return Ok(datavals);
} else {
return Err("*** Invalid data.".to_string());
}
}
Can someone advise me on the best way to do this? Should I stick to the procedural style here, and if so can that be improved? Or is there a cleaner functional way to do it? Or a blend of the two? Any advice appreciated.
I think that's what partition_map() from itertools is for:
use itertools::{Either, Itertools};
fn data_vals<'a>(sv: &[&'a str]) -> Result<Vec<i32>, Vec<(&'a str, std::num::ParseIntError)>> {
let (successes, failures): (Vec<_>, Vec<_>) =
sv.iter().partition_map(|s| match s.parse::<i32>() {
Ok(v) => Either::Left(v),
Err(e) => Either::Right((*s, e)),
});
if failures.len() != 0 {
Err(failures)
} else {
Ok(successes)
}
}
fn main() {
let numbers = vec!["42", "aaaezrgggtht", "..4rez41eza", "55"];
println!("{:#?}", data_vals(&numbers));
}
In a purely functional style, you have to avoid side-effects.
Printing errors is a side-effect. The preferred style would be to return an object of the style:
Result<Vec<i32>, Vec<String>>
and print the list after the data_vals function returns.
So, essentially, you want your processing to collect a list of integers, and a list of strings:
fn data_vals(sv: &Vec<(u32, String)>) -> Result<Vec<i32>, Vec<String>> {
let (ok, err): (Vec<_>, Vec<_>) = sv
.iter()
.map(|(i, s)| {
s.parse()
.map_err(|_e| format!("ERROR: Invalid data value at line {}: '{}'", i, s))
})
.partition(|e| e.is_ok());
if err.len() > 0 {
Err(err.iter().filter_map(|e| e.clone().err()).collect())
} else {
Ok(ok.iter().filter_map(|e| e.clone().ok()).collect())
}
}
fn main() {
let input = vec![(1, "0".to_string())];
let r = data_vals(&input);
assert_eq!(r, Ok(vec![0]));
let input = vec![(1, "zzz".to_string())];
let r = data_vals(&input);
assert_eq!(r, Err(vec!["ERROR: Invalid data value at line 1: 'zzz'".to_string()]));
}
Playground Link
This uses partition which does not depend on an external crate.
Side effects (eprintln!) in an iterator adapter are definitely not "functional". You should accumulate and return the errors and let the caller deal with them.
I would use fold here. The goal of fold is to reduce a list to a single value, starting from an initial value and augmenting the result with every item. This "single value" can very well be a list, though. Here, though, there are two possible lists we might want to return: a list of i32 if all values are valid, or a list of errors if there are any errors (I've chosen to return Strings for errors here, for simplicity.)
fn data_vals(sv: &[(u32, String)]) -> Result<Vec<i32>, Vec<String>> {
sv.iter().fold(
Ok(Vec::with_capacity(sv.len())),
|acc, (line_number, data)| {
let data = data
.parse::<i32>()
.map_err(|_| format!("Invalid data value at line {}: '{}'", line_number, data));
match (acc, data) {
(Ok(mut acc_data), Ok(this_data)) => {
// No errors yet; push the parsed value to the values vector.
acc_data.push(this_data);
Ok(acc_data)
}
(Ok(..), Err(this_error)) => {
// First error: replace the accumulator with an `Err` containing the first error.
Err(vec![this_error])
}
(Err(acc_errors), Ok(..)) => {
// There have been errors, but this item is valid; ignore it.
Err(acc_errors)
}
(Err(mut acc_errors), Err(this_error)) => {
// One more error: push it to the error vector.
acc_errors.push(this_error);
Err(acc_errors)
}
}
},
)
}
fn main() {
println!("{:?}", data_vals(&[]));
println!("{:?}", data_vals(&[(1, "123".into())]));
println!("{:?}", data_vals(&[(1, "123a".into())]));
println!("{:?}", data_vals(&[(1, "123".into()), (2, "123a".into())]));
println!("{:?}", data_vals(&[(1, "123a".into()), (2, "123".into())]));
println!("{:?}", data_vals(&[(1, "123a".into()), (2, "123b".into())]));
}
The initial value is Ok(Vec::with_capacity(sv.len())) (this is an optimization to avoid reallocating the vector as we push items to it; a simpler version would be Ok(vec![])). If the slice is empty, this will be fold's result; the closure will never be called.
For each item, the closure checks 1) whether there were any errors so far (indicated by the accumulator value being an Err) or not and 2) whether the current item is valid or not. I'm matching on two Result values simultaneously (by combining them in a tuple) to handle all 4 cases. The closure then returns an Ok if there are no errors so far (with all the parsed values so far) or an Err if there are any errors so far (with every invalid value found so far).
You'll notice I used the push method to add an item to a Vec. This is, strictly speaking, mutation, which is not considered "functional", but because we are moving the Vecs here, we know there are no other references to them, so we know we aren't affecting any other use of these Vecs.

What is the purpose of async/await in Rust?

In a language like C#, giving this code (I am not using the await keyword on purpose):
async Task Foo()
{
var task = LongRunningOperationAsync();
// Some other non-related operation
AnotherOperation();
result = task.Result;
}
In the first line, the long operation is run in another thread, and a Task is returned (that is a future). You can then do another operation that will run in parallel of the first one, and at the end, you can wait for the operation to be finished. I think that it is also the behavior of async/await in Python, JavaScript, etc.
On the other hand, in Rust, I read in the RFC that:
A fundamental difference between Rust's futures and those from other languages is that Rust's futures do not do anything unless polled. The whole system is built around this: for example, cancellation is dropping the future for precisely this reason. In contrast, in other languages, calling an async fn spins up a future that starts executing immediately.
In this situation, what is the purpose of async/await in Rust? Seeing other languages, this notation is a convenient way to run parallel operations, but I cannot see how it works in Rust if the calling of an async function does not run anything.
You are conflating a few concepts.
Concurrency is not parallelism, and async and await are tools for concurrency, which may sometimes mean they are also tools for parallelism.
Additionally, whether a future is immediately polled or not is orthogonal to the syntax chosen.
async / await
The keywords async and await exist to make creating and interacting with asynchronous code easier to read and look more like "normal" synchronous code. This is true in all of the languages that have such keywords, as far as I am aware.
Simpler code
This is code that creates a future that adds two numbers when polled
before
fn long_running_operation(a: u8, b: u8) -> impl Future<Output = u8> {
struct Value(u8, u8);
impl Future for Value {
type Output = u8;
fn poll(self: Pin<&mut Self>, _ctx: &mut Context) -> Poll<Self::Output> {
Poll::Ready(self.0 + self.1)
}
}
Value(a, b)
}
after
async fn long_running_operation(a: u8, b: u8) -> u8 {
a + b
}
Note that the "before" code is basically the implementation of today's poll_fn function
See also Peter Hall's answer about how keeping track of many variables can be made nicer.
References
One of the potentially surprising things about async/await is that it enables a specific pattern that wasn't possible before: using references in futures. Here's some code that fills up a buffer with a value in an asynchronous manner:
before
use std::io;
fn fill_up<'a>(buf: &'a mut [u8]) -> impl Future<Output = io::Result<usize>> + 'a {
futures::future::lazy(move |_| {
for b in buf.iter_mut() { *b = 42 }
Ok(buf.len())
})
}
fn foo() -> impl Future<Output = Vec<u8>> {
let mut data = vec![0; 8];
fill_up(&mut data).map(|_| data)
}
This fails to compile:
error[E0597]: `data` does not live long enough
--> src/main.rs:33:17
|
33 | fill_up_old(&mut data).map(|_| data)
| ^^^^^^^^^ borrowed value does not live long enough
34 | }
| - `data` dropped here while still borrowed
|
= note: borrowed value must be valid for the static lifetime...
error[E0505]: cannot move out of `data` because it is borrowed
--> src/main.rs:33:32
|
33 | fill_up_old(&mut data).map(|_| data)
| --------- ^^^ ---- move occurs due to use in closure
| | |
| | move out of `data` occurs here
| borrow of `data` occurs here
|
= note: borrowed value must be valid for the static lifetime...
after
use std::io;
async fn fill_up(buf: &mut [u8]) -> io::Result<usize> {
for b in buf.iter_mut() { *b = 42 }
Ok(buf.len())
}
async fn foo() -> Vec<u8> {
let mut data = vec![0; 8];
fill_up(&mut data).await.expect("IO failed");
data
}
This works!
Calling an async function does not run anything
The implementation and design of a Future and the entire system around futures, on the other hand, is unrelated to the keywords async and await. Indeed, Rust has a thriving asynchronous ecosystem (such as with Tokio) before the async / await keywords ever existed. The same was true for JavaScript.
Why aren't Futures polled immediately on creation?
For the most authoritative answer, check out this comment from withoutboats on the RFC pull request:
A fundamental difference between Rust's futures and those from other
languages is that Rust's futures do not do anything unless polled. The
whole system is built around this: for example, cancellation is
dropping the future for precisely this reason. In contrast, in other
languages, calling an async fn spins up a future that starts executing
immediately.
A point about this is that async & await in Rust are not inherently
concurrent constructions. If you have a program that only uses async &
await and no concurrency primitives, the code in your program will
execute in a defined, statically known, linear order. Obviously, most
programs will use some kind of concurrency to schedule multiple,
concurrent tasks on the event loop, but they don't have to. What this
means is that you can - trivially - locally guarantee the ordering of
certain events, even if there is nonblocking IO performed in between
them that you want to be asynchronous with some larger set of nonlocal
events (e.g. you can strictly control ordering of events inside of a
request handler, while being concurrent with many other request
handlers, even on two sides of an await point).
This property gives Rust's async/await syntax the kind of local
reasoning & low-level control that makes Rust what it is. Running up
to the first await point would not inherently violate that - you'd
still know when the code executed, it would just execute in two
different places depending on whether it came before or after an
await. However, I think the decision made by other languages to start
executing immediately largely stems from their systems which
immediately schedule a task concurrently when you call an async fn
(for example, that's the impression of the underlying problem I got
from the Dart 2.0 document).
Some of the Dart 2.0 background is covered by this discussion from munificent:
Hi, I'm on the Dart team. Dart's async/await was designed mainly by
Erik Meijer, who also worked on async/await for C#. In C#, async/await
is synchronous to the first await. For Dart, Erik and others felt that
C#'s model was too confusing and instead specified that an async
function always yields once before executing any code.
At the time, I and another on my team were tasked with being the
guinea pigs to try out the new in-progress syntax and semantics in our
package manager. Based on that experience, we felt async functions
should run synchronously to the first await. Our arguments were
mostly:
Always yielding once incurs a performance penalty for no good reason. In most cases, this doesn't matter, but in some it really
does. Even in cases where you can live with it, it's a drag to bleed a
little perf everywhere.
Always yielding means certain patterns cannot be implemented using async/await. In particular, it's really common to have code like
(pseudo-code here):
getThingFromNetwork():
if (downloadAlreadyInProgress):
return cachedFuture
cachedFuture = startDownload()
return cachedFuture
In other words, you have an async operation that you can call multiple times before it completes. Later calls use the same
previously-created pending future. You want to ensure you don't start
the operation multiple times. That means you need to synchronously
check the cache before starting the operation.
If async functions are async from the start, the above function can't use async/await.
We pleaded our case, but ultimately the language designers stuck with
async-from-the-top. This was several years ago.
That turned out to be the wrong call. The performance cost is real
enough that many users developed a mindset that "async functions are
slow" and started avoiding using it even in cases where the perf hit
was affordable. Worse, we see nasty concurrency bugs where people
think they can do some synchronous work at the top of a function and
are dismayed to discover they've created race conditions. Overall, it
seems users do not naturally assume an async function yields before
executing any code.
So, for Dart 2, we are now taking the very painful breaking change to
change async functions to be synchronous to the first await and
migrating all of our existing code through that transition. I'm glad
we're making the change, but I really wish we'd done the right thing
on day one.
I don't know if Rust's ownership and performance model place different
constraints on you where being async from the top really is better,
but from our experience, sync-to-the-first-await is clearly the better
trade-off for Dart.
cramert replies (note that some of this syntax is outdated now):
If you need code to execute immediately when a function is called
rather than later on when the future is polled, you can write your
function like this:
fn foo() -> impl Future<Item=Thing> {
println!("prints immediately");
async_block! {
println!("prints when the future is first polled");
await!(bar());
await!(baz())
}
}
Code examples
These examples use the async support in Rust 1.39 and the futures crate 0.3.1.
Literal transcription of the C# code
use futures; // 0.3.1
async fn long_running_operation(a: u8, b: u8) -> u8 {
println!("long_running_operation");
a + b
}
fn another_operation(c: u8, d: u8) -> u8 {
println!("another_operation");
c * d
}
async fn foo() -> u8 {
println!("foo");
let sum = long_running_operation(1, 2);
another_operation(3, 4);
sum.await
}
fn main() {
let task = foo();
futures::executor::block_on(async {
let v = task.await;
println!("Result: {}", v);
});
}
If you called foo, the sequence of events in Rust would be:
Something implementing Future<Output = u8> is returned.
That's it. No "actual" work is done yet. If you take the result of foo and drive it towards completion (by polling it, in this case via futures::executor::block_on), then the next steps are:
Something implementing Future<Output = u8> is returned from calling long_running_operation (it does not start work yet).
another_operation does work as it is synchronous.
the .await syntax causes the code in long_running_operation to start. The foo future will continue to return "not ready" until the computation is done.
The output would be:
foo
another_operation
long_running_operation
Result: 3
Note that there are no thread pools here: this is all done on a single thread.
async blocks
You can also use async blocks:
use futures::{future, FutureExt}; // 0.3.1
fn long_running_operation(a: u8, b: u8) -> u8 {
println!("long_running_operation");
a + b
}
fn another_operation(c: u8, d: u8) -> u8 {
println!("another_operation");
c * d
}
async fn foo() -> u8 {
println!("foo");
let sum = async { long_running_operation(1, 2) };
let oth = async { another_operation(3, 4) };
let both = future::join(sum, oth).map(|(sum, _)| sum);
both.await
}
Here we wrap synchronous code in an async block and then wait for both actions to complete before this function will be complete.
Note that wrapping synchronous code like this is not a good idea for anything that will actually take a long time; see What is the best approach to encapsulate blocking I/O in future-rs? for more info.
With a threadpool
// Requires the `thread-pool` feature to be enabled
use futures::{executor::ThreadPool, future, task::SpawnExt, FutureExt};
async fn foo(pool: &mut ThreadPool) -> u8 {
println!("foo");
let sum = pool
.spawn_with_handle(async { long_running_operation(1, 2) })
.unwrap();
let oth = pool
.spawn_with_handle(async { another_operation(3, 4) })
.unwrap();
let both = future::join(sum, oth).map(|(sum, _)| sum);
both.await
}
The purpose of async/await in Rust is to provide a toolkit for concurrency—same as in C# and other languages.
In C# and JavaScript, async methods start running immediately, and they're scheduled whether you await the result or not. In Python and Rust, when you call an async method, nothing happens (it isn't even scheduled) until you await it. But it's largely the same programming style either way.
The ability to spawn another task (that runs concurrent with and independent of the current task) is provided by libraries: see async_std::task::spawn and tokio::task::spawn.
As for why Rust async is not exactly like C#, well, consider the differences between the two languages:
Rust discourages global mutable state. In C# and JS, every async method call is implicitly added to a global mutable queue. It's a side effect to some implicit context. For better or worse, that's not Rust's style.
Rust is not a framework. It makes sense that C# provides a default event loop. It also provides a great garbage collector! Lots of things that come standard in other languages are optional libraries in Rust.
Consider this simple pseudo-JavaScript code that fetches some data, processes it, fetches some more data based on the previous step, summarises it, and then prints a result:
getData(url)
.then(response -> parseObjects(response.data))
.then(data -> findAll(data, 'foo'))
.then(foos -> getWikipediaPagesFor(foos))
.then(sumPages)
.then(sum -> console.log("sum is: ", sum));
In async/await form, that's:
async {
let response = await getData(url);
let objects = parseObjects(response.data);
let foos = findAll(objects, 'foo');
let pages = await getWikipediaPagesFor(foos);
let sum = sumPages(pages);
console.log("sum is: ", sum);
}
It introduces a lot of single-use variables and is arguably worse than the original version with promises. So why bother?
Consider this change, where the variables response and objects are needed later on in the computation:
async {
let response = await getData(url);
let objects = parseObjects(response.data);
let foos = findAll(objects, 'foo');
let pages = await getWikipediaPagesFor(foos);
let sum = sumPages(pages, objects.length);
console.log("sum is: ", sum, " and status was: ", response.status);
}
And try to rewrite it in the original form with promises:
getData(url)
.then(response -> Promise.resolve(parseObjects(response.data))
.then(objects -> Promise.resolve(findAll(objects, 'foo'))
.then(foos -> getWikipediaPagesFor(foos))
.then(pages -> sumPages(pages, objects.length)))
.then(sum -> console.log("sum is: ", sum, " and status was: ", response.status)));
Each time you need to refer back to a previous result, you need to nest the entire structure one level deeper. This can quickly become very difficult to read and maintain, but the async/await version does not suffer from this problem.

Why do I have to wrap an Async<T> into another async workflow and let! it?

I'm trying to understand async workflows in F# but I found one part that I really don't understand.
The following code works fine:
let asynWorkflow = async{
let! result = Stream.TryOpenAsync(partition) |> Async.AwaitTask
return result
}
let stream = Async.RunSynchronously asynWorkflow
|> fun openResult -> if openResult.Found then openResult.Stream else Stream(partition)
I define a async workflow where TryOpenAsync returns a Task<StreamOpenResult> type. I convert it to Async<StreamOpenResult> with Async.AwaitTask. (Side quest: "Await"Task? It doesn't await it just convert it, does it? I think it has nothing to do with Task.Wait or the await keyword). I "await" it with let! and return it.
To start the workflow I use RunSynchronously which should start the workflow and return the result (bind it). On the result I check if the Stream is Found or not.
But now to my first question. Why do I have to wrap the TryOpenAsync call in another async computation and let! ("await") it?
E.g. the following code does not work:
let asynWorkflow = Stream.TryOpenAsync(partition) |> Async.AwaitTask
let stream = Async.RunSynchronously asynWorkflow
|> fun openResult -> if openResult.Found then openResult.Stream else Stream(partition)
I thought the AwaitTask makes it an Async<T> and RunSynchronously should start it. Then use the result. What do I miss?
My second question is why is there any "Async.Let!" function available? Maybe because it does not work or better why doesn't it work with the following code?
let ``let!`` task = async{
let! result = task |> Async.AwaitTask
return result
}
let stream = Async.RunSynchronously ( ``let!`` (Stream.TryOpenAsync(partition)) )
|> fun openResult -> if openResult.Found then openResult.Stream else Stream(partition)
I just insert the TryOpenAsync as a parameter but it does not work. By saying does not work I mean the whole FSI will hang. So it has something to do with my async/"await".
--- Update:
Result of working code in FSI:
>
Real: 00:00:00.051, CPU: 00:00:00.031, GC gen0: 0, gen1: 0, gen2: 0
val asynWorkflow : Async<StreamOpenResult>
val stream : Stream
Result of not working code in FSI:
>
And you cannot execute anything in the FSI anymore
--- Update 2
I'm using Streamstone. Here the C# example: https://github.com/yevhen/Streamstone/blob/master/Source/Example/Scenarios/S04_Write_to_stream.cs
and here the Stream.TryOpenAsync: https://github.com/yevhen/Streamstone/blob/master/Source/Streamstone/Stream.Api.cs#L192
I can't tell you why the second example doesn't work without knowing what Stream and partition are and how they work.
However, I want to take this opportunity to point out that the two examples are not strictly equivalent.
F# async is kind of like a "recipe" for what to do. When you write async { ... }, the resulting computation is just sitting there, not actually doing anything. It's more like declaring a function than like issuing a command. Only when you "start" it by calling something like Async.RunSynchronously or Async.Start does it actually run. A corollary is that you can start the same async workflow multiple times, and it's going to be a new workflow every time. Very similar to how IEnumerable works.
C# Task, on the other hand, is more like a "reference" to an async computation that is already running. The computation starts as soon as you call Stream.TryOpenAsync(partition), and it's impossible to obtain a Task instance before the task actually starts. You can await the resulting Task multiple times, but each await will not result in a fresh attempt to open a stream. Only the first await will actually wait for the task's completion, and every subsequent one will just return you the same remembered result.
In the async/reactive lingo, F# async is what you call "cold", while C# Task is referred to as "hot".
The second code block looks like it should work to me. It does run it if I provide dummy implementations for Stream and StreamOpenResult.
You should avoid using Async.RunSynchronously wherever possible because it defeats the purpose of async. Put all of this code within a larger async block and then you will have access to the StreamOpenResult:
async {
let! openResult = Stream.TryOpenAsync(partition) |> Async.AwaitTask
let stream = if openResult.Found then openResult.Stream else Stream(partition)
return () // now do something with the stream
}
You may need to put a Async.Start or Async.RunSynchronously at the very outer edge of your program to actually run it, but it's better if you have the async (or convert it to a Task) and pass it to some other code (e.g. a web framework) that can call it in a non-blocking manner.
Not that I want to answer your question with another question, but: why are you doing code like this anyway? That might help to understand it. Why not just:
let asyncWorkflow = async {
let! result = Stream.TryOpenAsync(partition) |> Async.AwaitTask
if result.Found then return openResult.Stream else return Stream(partition) }
There's little point in creating an async workflow only to immediately call RunSynchronously on it - it's similar to calling .Result on a Task - it just blocks the current thread until the workflow returns.

Resources