Related
I want to learn the rust Tokio library. To facilitate this I want to write an ASYNC TCP logger in rust.
Basically a TCP client that connects to a TCP server (172.16.10.10 port 7777) and just logs messages received asynchronously file to a log file. I want the main function to read user input from the console - in my case was for pressing ‘q’ key - simulate the program doing some other task.
I expect to receive multiple TCP responses whilst waiting for user to press ‘q’ key.
I am trying to workout how to read and log multiple TCP responses independently of waiting for the user input
let mut buf_reader = BufReader::new(&stream);
let mut data = vec![];
buf_reader.read_to_end(&mut data).await.unwrap();
log_writer.write_all(&data).await.unwrap();`
Here is the code I have
use tokio::net::TcpStream;
use tokio::prelude::*;
use std::io::{stdin, stdout, Write, BufWriter, BufReader};
use std::fs::File;
#[tokio::main]
async fn main() {
let ip = "172.16.10.10:7777";
let mut stream = TcpStream::connect(ip).await.unwrap();
let message = [0x16, 0x02];
stream.write(&message).await.unwrap();
// Open a file for logging
let file = File::create("log.txt").unwrap();
let mut log_writer = BufWriter::new(file);
println!("Press 'q' to exit and receive response:");
stdout().flush().unwrap();
let mut input = String::new();
stdin().read_line(&mut input).unwrap();
if input.trim() == "q" {
// SIMULATE doing time consuming task
println!(“Quitting”);
}
}
I tried the following but this loops over the waiting for user input. This is not behaviour I want. I want to be reading and logging the TCP messages independent of the awaiting user inout.
loop {
stdin().read_line(&mut input).unwrap();
if input.trim() == "q" {
break;
}
let mut data = vec![];
buf_reader.read_to_end(&mut data).await.unwrap();
log_writer.write_all(&data).await.unwrap();
}
When I needed async multithreading, I defined multiple async fn to do what I want, then called them in async fn main as:
let handle1 = tokio::spawn(do_it("test_data.txt"));
let handle2 = tokio::spawn(do_something_else("test_data.txt"));
handle1.await.unwrap();
handle2.await.unwrap();
Since I'm zealous about keeping fn main as minimal as possible, this may not exactly work for you, but may give you a direction.
I was recently informed that in
async {
return! async { return "hi" } }
|> Async.RunSynchronously
|> printfn "%s"
the nested Async<'T> (async { return 1 }) would not be sent to the thread pool for evaluation, whereas in
async {
use ms = new MemoryStream [| 0x68uy; 0x69uy |]
use sr = new StreamReader (ms)
return! sr.ReadToEndAsync () |> Async.AwaitTask }
|> Async.RunSynchronously
|> printfn "%s"
the nested Async<'T> (sr.ReadToEndAsync () |> Async.AwaitTask) would be. What is it about an Async<'T> that decides whether it's sent to the thread pool when it's executed in an asynchronous operation like let! or return!? In particular, how would you define one which is sent to the thread pool? What code do you have to include in the async block, or in the lambda passed into Async.FromContinuations?
TL;DR: It's not quite like that. The async itself doesn't "send" anything to the thread pool. All it does is just run continuations until they stop. And if one of those continuations decides to continue on a new thread - well, that's when thread switching happens.
Let's set up a small example to illustrate what happens:
let log str = printfn $"{str}: thread = {Thread.CurrentThread.ManagedThreadId}"
let f = async {
log "1"
let! x = async { log "2"; return 42 }
log "3"
do! Async.Sleep(TimeSpan.FromSeconds(3.0))
log "4"
}
log "starting"
f |> Async.StartImmediate
log "started"
Console.ReadLine()
If you run this script, it will print, starting, then 1, 2, 3, then started, then wait 3 seconds, and then print 4, and all of them except 4 will have the same thread ID. You can see that everything until Async.Sleep is executed synchronously on the same thread, but after that async execution stops and the main program execution continues, printing started and then blocking on ReadLine. By the time Async.Sleep wakes up and wants to continue execution, the original thread is already blocked on ReadLine, so the async computation gets to continue running on a new one.
What's going on here? How does this function?
First, the way the async computation is structured is in "continuation-passing style". It's a technique where every function doesn't return its result to the caller, but calls another function instead, passing the result as its parameter.
Let me illustrate with an example:
// "Normal" style:
let f x = x + 5
let g x = x * 2
printfn "%d" (f (g 3)) // prints 11
// Continuation-passing style:
let f x next = next (x + 5)
let g x next = next (x * 2)
g 3 (fun res1 -> f res1 (fun res2 -> printfn "%d" res2))
This is called "continuation-passing" because the next parameters are called "continuations" - i.e. they're functions that express how the program continues after calling f or g. And yes, this is exactly what Async.FromContinuations means.
Seeming very silly and roundabout on the surface, what this allows us to do is for each function to decide when, how, or even if its continuation happens. For example, our f function from above could be doing something asynchronous instead of just plain returning the result:
let f x next = httpPost "http://calculator.com/add5" x next
Coding it in continuation-passing style would allow such function to not block the current thread while the request to calculator.com is in flight. What's wrong with blocking the thread, you ask? I'll refer you to the original answer that prompted your question in the first place.
Second, when you write those async { ... } blocks, the compiler gives you a little help. It takes what looks like a step-by-step imperative program and "unrolls" it into a series of continuation-passing calls. The "breaking" points for this unfolding are all the constructs that end with a bang - let!, do!, return!.
The above async block, for example, would look somethiing like this (F#-ish pseudocode):
let return42 onDone =
log "2"
onDone 42
let f onDone =
log "1"
return42 (fun x ->
log "3"
Async.Sleep (3 seconds) (fun () ->
log "4"
onDone ()
)
)
Here, you can plainly see that the return42 function simply calls its continuation right away, thus making the whole thing from log "1" to log "3" completely synchronous, whereas the Async.Sleep function doesn't call its continuation right away, instead scheduling it to be run later (in 3 seconds) on the thread pool. That's where the thread switching happens.
And here, finally, lies the answer to your question: in order to have the async computation jump threads, your callback passed to Async.FromContinuations should do anything but call the success continuation immediately.
A few notes for further investigation
The onDone technique in the above example is technically called "monadic bind", and indeed in real F# programs it's represented by the async.Bind method. This answer might also be of help understanding the concept.
The above is a bit of an oversimplification. In reality the async execution is a bit more complicated than that. Internally it uses a technique called "trampoline", which in plain terms is just a loop that runs a single thunk on every turn, but crucially, the running thunk can also "ask" it to run another thunk, and if it does, the loop will do so, and so on, forever, until the next thunk doesn't ask to run another thunk, and then the whole thing finally stops.
I specifically used Async.StartImmediate to start the computation in my example, because Async.StartImmediate will do just what it says on the tin: it will start running the computation immediately, right there. That's why everything ran on the same thread as the main program. There are many alternative starting functions in the Async module. For example, Async.Start will start the computation on the thread pool. The lines from log "1" to log "3" will still all happen synchronously, without thread switching between them, but it will happen on a different thread from log "start" and log "starting". In this case thread switching will happen before the async computation even starts, so it doesn't count.
I've got a Vec of futures created from calling an async function. After adding all the futures to the vector, I'd like to wait on the whole set, getting either a list of results, or a callback for each one that finishes.
I could simply loop or iterate over the vector of futures and call .await on each future and that would allow me to handle errors correctly and not have futures::future::join_all cancel the others, but I'm sure there's a more idiomatic way to accomplish this task.
I also want to be able to handle the futures as they complete so if I get enough information from the first few, I can cancel the remaining incomplete futures and not wait for them and discard their results, error or not. This would not be possible if I iterated over the vector in order.
What I'm looking for is a callback (closure, etc) that lets me accumulate the results as they come in so that I can handle errors appropriately or cancel the remainder of the futures (from within the callback) if I determine I don't need the rest of them.
I can tell that's asking for a headache from the borrow checker: trying to modify a future in a Vec in a callback from the async engine.
There's a number of Stack Overflow questions and Reddit posts that explain how join_all joins on a list of futures but cancels the rest if one fails, and how async engines may spawn threads or they may not or they're bad design if they do.
Use futures::select_all:
use futures::future; // 0.3.4
type Error = Box<dyn std::error::Error + Send + Sync + 'static>;
type Result<T, E = Error> = std::result::Result<T, E>;
async fn might_fail(fails: bool) -> Result<i32> {
if fails {
Ok(42)
} else {
Err("boom".into())
}
}
async fn many() -> Result<i32> {
let raw_futs = vec![might_fail(true), might_fail(false), might_fail(true)];
let unpin_futs: Vec<_> = raw_futs.into_iter().map(Box::pin).collect();
let mut futs = unpin_futs;
let mut sum = 0;
while !futs.is_empty() {
match future::select_all(futs).await {
(Ok(val), _index, remaining) => {
sum += val;
futs = remaining;
}
(Err(_e), _index, remaining) => {
// Ignoring all errors
futs = remaining;
}
}
if sum > 42 {
// Early exit
return Ok(sum);
}
}
Ok(sum)
}
This will poll all the futures in the collection, returning the first one that is not pending, its index, and the remaining pending or unpolled futures. You can then match on the Result and handle the success or failure case.
Call select_all inside of a loop. This gives you the ability to exit early from the function. When you exit, the futs vector is dropped and dropping a future cancels it.
As mentioned in comments by #kmdreko, #John Kugelman and #e-net4-the-comment-flagger, use join_all from the futures crate version 0.3. The join_all in futures version 0.2 has the cancel behavior described in the question, but join_all in futures crate version 0.3 does not.
For a quick/hacky solution without having to mess with select_all. You can use join_all, but instead of returning Result<T> return Result<Result<T>> - always returning Ok for the top-level result
Or you can use join_all
The example from the documentation:
use futures::future::join_all;
async fn foo(i: u32) -> u32 { i }
let futures = vec![foo(1), foo(2), foo(3)];
assert_eq!(join_all(futures).await, [1, 2, 3]);
In a language like C#, giving this code (I am not using the await keyword on purpose):
async Task Foo()
{
var task = LongRunningOperationAsync();
// Some other non-related operation
AnotherOperation();
result = task.Result;
}
In the first line, the long operation is run in another thread, and a Task is returned (that is a future). You can then do another operation that will run in parallel of the first one, and at the end, you can wait for the operation to be finished. I think that it is also the behavior of async/await in Python, JavaScript, etc.
On the other hand, in Rust, I read in the RFC that:
A fundamental difference between Rust's futures and those from other languages is that Rust's futures do not do anything unless polled. The whole system is built around this: for example, cancellation is dropping the future for precisely this reason. In contrast, in other languages, calling an async fn spins up a future that starts executing immediately.
In this situation, what is the purpose of async/await in Rust? Seeing other languages, this notation is a convenient way to run parallel operations, but I cannot see how it works in Rust if the calling of an async function does not run anything.
You are conflating a few concepts.
Concurrency is not parallelism, and async and await are tools for concurrency, which may sometimes mean they are also tools for parallelism.
Additionally, whether a future is immediately polled or not is orthogonal to the syntax chosen.
async / await
The keywords async and await exist to make creating and interacting with asynchronous code easier to read and look more like "normal" synchronous code. This is true in all of the languages that have such keywords, as far as I am aware.
Simpler code
This is code that creates a future that adds two numbers when polled
before
fn long_running_operation(a: u8, b: u8) -> impl Future<Output = u8> {
struct Value(u8, u8);
impl Future for Value {
type Output = u8;
fn poll(self: Pin<&mut Self>, _ctx: &mut Context) -> Poll<Self::Output> {
Poll::Ready(self.0 + self.1)
}
}
Value(a, b)
}
after
async fn long_running_operation(a: u8, b: u8) -> u8 {
a + b
}
Note that the "before" code is basically the implementation of today's poll_fn function
See also Peter Hall's answer about how keeping track of many variables can be made nicer.
References
One of the potentially surprising things about async/await is that it enables a specific pattern that wasn't possible before: using references in futures. Here's some code that fills up a buffer with a value in an asynchronous manner:
before
use std::io;
fn fill_up<'a>(buf: &'a mut [u8]) -> impl Future<Output = io::Result<usize>> + 'a {
futures::future::lazy(move |_| {
for b in buf.iter_mut() { *b = 42 }
Ok(buf.len())
})
}
fn foo() -> impl Future<Output = Vec<u8>> {
let mut data = vec![0; 8];
fill_up(&mut data).map(|_| data)
}
This fails to compile:
error[E0597]: `data` does not live long enough
--> src/main.rs:33:17
|
33 | fill_up_old(&mut data).map(|_| data)
| ^^^^^^^^^ borrowed value does not live long enough
34 | }
| - `data` dropped here while still borrowed
|
= note: borrowed value must be valid for the static lifetime...
error[E0505]: cannot move out of `data` because it is borrowed
--> src/main.rs:33:32
|
33 | fill_up_old(&mut data).map(|_| data)
| --------- ^^^ ---- move occurs due to use in closure
| | |
| | move out of `data` occurs here
| borrow of `data` occurs here
|
= note: borrowed value must be valid for the static lifetime...
after
use std::io;
async fn fill_up(buf: &mut [u8]) -> io::Result<usize> {
for b in buf.iter_mut() { *b = 42 }
Ok(buf.len())
}
async fn foo() -> Vec<u8> {
let mut data = vec![0; 8];
fill_up(&mut data).await.expect("IO failed");
data
}
This works!
Calling an async function does not run anything
The implementation and design of a Future and the entire system around futures, on the other hand, is unrelated to the keywords async and await. Indeed, Rust has a thriving asynchronous ecosystem (such as with Tokio) before the async / await keywords ever existed. The same was true for JavaScript.
Why aren't Futures polled immediately on creation?
For the most authoritative answer, check out this comment from withoutboats on the RFC pull request:
A fundamental difference between Rust's futures and those from other
languages is that Rust's futures do not do anything unless polled. The
whole system is built around this: for example, cancellation is
dropping the future for precisely this reason. In contrast, in other
languages, calling an async fn spins up a future that starts executing
immediately.
A point about this is that async & await in Rust are not inherently
concurrent constructions. If you have a program that only uses async &
await and no concurrency primitives, the code in your program will
execute in a defined, statically known, linear order. Obviously, most
programs will use some kind of concurrency to schedule multiple,
concurrent tasks on the event loop, but they don't have to. What this
means is that you can - trivially - locally guarantee the ordering of
certain events, even if there is nonblocking IO performed in between
them that you want to be asynchronous with some larger set of nonlocal
events (e.g. you can strictly control ordering of events inside of a
request handler, while being concurrent with many other request
handlers, even on two sides of an await point).
This property gives Rust's async/await syntax the kind of local
reasoning & low-level control that makes Rust what it is. Running up
to the first await point would not inherently violate that - you'd
still know when the code executed, it would just execute in two
different places depending on whether it came before or after an
await. However, I think the decision made by other languages to start
executing immediately largely stems from their systems which
immediately schedule a task concurrently when you call an async fn
(for example, that's the impression of the underlying problem I got
from the Dart 2.0 document).
Some of the Dart 2.0 background is covered by this discussion from munificent:
Hi, I'm on the Dart team. Dart's async/await was designed mainly by
Erik Meijer, who also worked on async/await for C#. In C#, async/await
is synchronous to the first await. For Dart, Erik and others felt that
C#'s model was too confusing and instead specified that an async
function always yields once before executing any code.
At the time, I and another on my team were tasked with being the
guinea pigs to try out the new in-progress syntax and semantics in our
package manager. Based on that experience, we felt async functions
should run synchronously to the first await. Our arguments were
mostly:
Always yielding once incurs a performance penalty for no good reason. In most cases, this doesn't matter, but in some it really
does. Even in cases where you can live with it, it's a drag to bleed a
little perf everywhere.
Always yielding means certain patterns cannot be implemented using async/await. In particular, it's really common to have code like
(pseudo-code here):
getThingFromNetwork():
if (downloadAlreadyInProgress):
return cachedFuture
cachedFuture = startDownload()
return cachedFuture
In other words, you have an async operation that you can call multiple times before it completes. Later calls use the same
previously-created pending future. You want to ensure you don't start
the operation multiple times. That means you need to synchronously
check the cache before starting the operation.
If async functions are async from the start, the above function can't use async/await.
We pleaded our case, but ultimately the language designers stuck with
async-from-the-top. This was several years ago.
That turned out to be the wrong call. The performance cost is real
enough that many users developed a mindset that "async functions are
slow" and started avoiding using it even in cases where the perf hit
was affordable. Worse, we see nasty concurrency bugs where people
think they can do some synchronous work at the top of a function and
are dismayed to discover they've created race conditions. Overall, it
seems users do not naturally assume an async function yields before
executing any code.
So, for Dart 2, we are now taking the very painful breaking change to
change async functions to be synchronous to the first await and
migrating all of our existing code through that transition. I'm glad
we're making the change, but I really wish we'd done the right thing
on day one.
I don't know if Rust's ownership and performance model place different
constraints on you where being async from the top really is better,
but from our experience, sync-to-the-first-await is clearly the better
trade-off for Dart.
cramert replies (note that some of this syntax is outdated now):
If you need code to execute immediately when a function is called
rather than later on when the future is polled, you can write your
function like this:
fn foo() -> impl Future<Item=Thing> {
println!("prints immediately");
async_block! {
println!("prints when the future is first polled");
await!(bar());
await!(baz())
}
}
Code examples
These examples use the async support in Rust 1.39 and the futures crate 0.3.1.
Literal transcription of the C# code
use futures; // 0.3.1
async fn long_running_operation(a: u8, b: u8) -> u8 {
println!("long_running_operation");
a + b
}
fn another_operation(c: u8, d: u8) -> u8 {
println!("another_operation");
c * d
}
async fn foo() -> u8 {
println!("foo");
let sum = long_running_operation(1, 2);
another_operation(3, 4);
sum.await
}
fn main() {
let task = foo();
futures::executor::block_on(async {
let v = task.await;
println!("Result: {}", v);
});
}
If you called foo, the sequence of events in Rust would be:
Something implementing Future<Output = u8> is returned.
That's it. No "actual" work is done yet. If you take the result of foo and drive it towards completion (by polling it, in this case via futures::executor::block_on), then the next steps are:
Something implementing Future<Output = u8> is returned from calling long_running_operation (it does not start work yet).
another_operation does work as it is synchronous.
the .await syntax causes the code in long_running_operation to start. The foo future will continue to return "not ready" until the computation is done.
The output would be:
foo
another_operation
long_running_operation
Result: 3
Note that there are no thread pools here: this is all done on a single thread.
async blocks
You can also use async blocks:
use futures::{future, FutureExt}; // 0.3.1
fn long_running_operation(a: u8, b: u8) -> u8 {
println!("long_running_operation");
a + b
}
fn another_operation(c: u8, d: u8) -> u8 {
println!("another_operation");
c * d
}
async fn foo() -> u8 {
println!("foo");
let sum = async { long_running_operation(1, 2) };
let oth = async { another_operation(3, 4) };
let both = future::join(sum, oth).map(|(sum, _)| sum);
both.await
}
Here we wrap synchronous code in an async block and then wait for both actions to complete before this function will be complete.
Note that wrapping synchronous code like this is not a good idea for anything that will actually take a long time; see What is the best approach to encapsulate blocking I/O in future-rs? for more info.
With a threadpool
// Requires the `thread-pool` feature to be enabled
use futures::{executor::ThreadPool, future, task::SpawnExt, FutureExt};
async fn foo(pool: &mut ThreadPool) -> u8 {
println!("foo");
let sum = pool
.spawn_with_handle(async { long_running_operation(1, 2) })
.unwrap();
let oth = pool
.spawn_with_handle(async { another_operation(3, 4) })
.unwrap();
let both = future::join(sum, oth).map(|(sum, _)| sum);
both.await
}
The purpose of async/await in Rust is to provide a toolkit for concurrency—same as in C# and other languages.
In C# and JavaScript, async methods start running immediately, and they're scheduled whether you await the result or not. In Python and Rust, when you call an async method, nothing happens (it isn't even scheduled) until you await it. But it's largely the same programming style either way.
The ability to spawn another task (that runs concurrent with and independent of the current task) is provided by libraries: see async_std::task::spawn and tokio::task::spawn.
As for why Rust async is not exactly like C#, well, consider the differences between the two languages:
Rust discourages global mutable state. In C# and JS, every async method call is implicitly added to a global mutable queue. It's a side effect to some implicit context. For better or worse, that's not Rust's style.
Rust is not a framework. It makes sense that C# provides a default event loop. It also provides a great garbage collector! Lots of things that come standard in other languages are optional libraries in Rust.
Consider this simple pseudo-JavaScript code that fetches some data, processes it, fetches some more data based on the previous step, summarises it, and then prints a result:
getData(url)
.then(response -> parseObjects(response.data))
.then(data -> findAll(data, 'foo'))
.then(foos -> getWikipediaPagesFor(foos))
.then(sumPages)
.then(sum -> console.log("sum is: ", sum));
In async/await form, that's:
async {
let response = await getData(url);
let objects = parseObjects(response.data);
let foos = findAll(objects, 'foo');
let pages = await getWikipediaPagesFor(foos);
let sum = sumPages(pages);
console.log("sum is: ", sum);
}
It introduces a lot of single-use variables and is arguably worse than the original version with promises. So why bother?
Consider this change, where the variables response and objects are needed later on in the computation:
async {
let response = await getData(url);
let objects = parseObjects(response.data);
let foos = findAll(objects, 'foo');
let pages = await getWikipediaPagesFor(foos);
let sum = sumPages(pages, objects.length);
console.log("sum is: ", sum, " and status was: ", response.status);
}
And try to rewrite it in the original form with promises:
getData(url)
.then(response -> Promise.resolve(parseObjects(response.data))
.then(objects -> Promise.resolve(findAll(objects, 'foo'))
.then(foos -> getWikipediaPagesFor(foos))
.then(pages -> sumPages(pages, objects.length)))
.then(sum -> console.log("sum is: ", sum, " and status was: ", response.status)));
Each time you need to refer back to a previous result, you need to nest the entire structure one level deeper. This can quickly become very difficult to read and maintain, but the async/await version does not suffer from this problem.
I have a recursive function. The function will call itself with various different values depending on the data it gets, so the arity and depth of recursion is not known: each call may call itself zero or more times. The function may return any number of values.
I want to parallelise it by getting goroutines and channels involved. Each recursion of inner runs in its own goroutine, and sends back a value on the channel. The outer function deals with those values.
func outer(response []int) {
results := make([]int)
resultsChannel := make(chan int)
inner := func(...) {
resultsChannel <- «some result»;
// Recurse in a new goroutine.
for _, recursionArgument in «some calculated data» {
go inner(recursionArgument)
}
}
go inner(«initial values»);
for {
result := <- resultsChannel
results = append(results, result)
// HELP! How do I decide when to break?
}
return results
}
The problem comes with escaping the results channel loop. Because of the 'shape' of the recursion (unknown arity and depth) I can't say "finish after n events" and I can't send a sentinel value.
How do I detect when all my recursions have happened and return from outer? Is there a better way to approach this?
You can use a sync.WaitGroup to manage the collection of goroutines you spawn: call Add(1) before spawning each new goroutine, and Done when each goroutine completes. So something like this:
var wg sync.WaitGroup
inner := func(...) {
...
// Recurse in a new goroutine.
for _, recursionArgument := range «some calculated data» {
wg.Add(1)
go inner(recursionArgument)
}
...
wg.Done()
}
wg.Add(1)
go inner(«initial values»)
Now waiting on wg will tell you when all the goroutines have completed.
If you are reading the results from a channel, the obvious way to tell when there are no more results is by closing the channel. You can achieve this through another goroutine to do this for us:
go func() {
wg.Wait()
close(resultsChannel)
}()
You should now be able to simply range over resultsChannel to read all the results.