Have a rust function pause execution and return a value multiple times in without losing its stack - asynchronous

I am working on a project that has long compute times to which I will have hundreds of nodes running it, as part of my implementation I have a status handler object/struct which talks to the API and gets the needed information like the arguments, the status handler then calls the main intensive function.
It order to keep tabs on the computationally intensive function I would like it to yield back to the status handler function with the completed percentage so the status handler can update the API and then allow the intensive function to continue computation without losing any of its stack (such as it variables and file handles)
I've looked into async function but they seem to only return once.
Thank you in advance!

What you are asking for are generators, which are currently not stable. You can either experiment with them on nightly, or manually create something that works like them and manually call it (though it won't has as nice syntax as generators). Something like this for example:
enum Status<T, K> {
Updated(T),
Finished(K)
}
struct State { /* ... */ }
impl State {
pub fn new() -> Self {
Self { /* ... */ }
}
pub fn call(self) -> Status<Self, ()> {
// Do some update on State and return
// State::Updated(self). When you finised
// return State::Finished(()). Note that this
// method consumes self. You could make it take
// &mut self, but then you would have to worry
// about how to prevent it beeing called after
// it finished. If you want to return some intermidiate
// value in each step you can make Status::Updated
// contain a (state, value) instead.
todo!()
}
}
fn foo() {
let mut state = State::new();
let result = loop {
match state.call() {
Status::Updated(s) => state = s,
Status::Finished(result) => break result
}
};
}

Async can in fact pause and resume, but it is meant for IO bound programs that basically wait for some external IO the entire time. It's not meant for computation heavy tasks.
There two ways that come to my mind on how to solve this problem:
threads & channels
callbacks
Solution 1: Threads & Channels
use std::{sync::mpsc, thread, time::Duration};
struct StatusUpdate {
task_id: i32,
percent: f32,
}
impl StatusUpdate {
pub fn new(task_id: i32, percent: f32) -> Self {
Self { task_id, percent }
}
}
fn expensive_computation(id: i32, status_update: mpsc::Sender<StatusUpdate>) {
status_update.send(StatusUpdate::new(id, 0.0)).unwrap();
thread::sleep(Duration::from_millis(1000));
status_update.send(StatusUpdate::new(id, 33.3)).unwrap();
thread::sleep(Duration::from_millis(1000));
status_update.send(StatusUpdate::new(id, 66.6)).unwrap();
thread::sleep(Duration::from_millis(1000));
status_update.send(StatusUpdate::new(id, 100.0)).unwrap();
}
fn main() {
let (status_sender_1, status_receiver) = mpsc::channel();
let status_sender_2 = status_sender_1.clone();
thread::spawn(move || expensive_computation(1, status_sender_1));
thread::spawn(move || expensive_computation(2, status_sender_2));
for status_update in status_receiver {
println!(
"Task {} is done {} %",
status_update.task_id, status_update.percent
);
}
}
Task 1 is done 0 %
Task 2 is done 0 %
Task 1 is done 33.3 %
Task 2 is done 33.3 %
Task 1 is done 66.6 %
Task 2 is done 66.6 %
Task 2 is done 100 %
Task 1 is done 100 %
Solution 2: Callbacks
use std::{thread, time::Duration};
struct StatusUpdate {
task_id: i32,
percent: f32,
}
impl StatusUpdate {
pub fn new(task_id: i32, percent: f32) -> Self {
Self { task_id, percent }
}
}
fn expensive_computation<F: FnMut(StatusUpdate)>(id: i32, mut update_status: F) {
update_status(StatusUpdate::new(id, 0.0));
thread::sleep(Duration::from_millis(1000));
update_status(StatusUpdate::new(id, 33.3));
thread::sleep(Duration::from_millis(1000));
update_status(StatusUpdate::new(id, 66.6));
thread::sleep(Duration::from_millis(1000));
update_status(StatusUpdate::new(id, 100.0));
}
fn main() {
expensive_computation(1, |status_update| {
println!(
"Task {} is done {} %",
status_update.task_id, status_update.percent
);
});
}
Task 1 is done 0 %
Task 1 is done 33.3 %
Task 1 is done 66.6 %
Task 1 is done 100 %
Note that with the channels solution it is much easier to handle multiple computations on different threads at once. With callbacks, communicating between threads is hard/impossible.
am I able to pause execution of the expensive function while I do something with that data then allow it to resume?
No, this is not something that can be done with threads. In general.
You can run them with a lower priority than the main thread, which means they won't be scheduled as aggressively, which decreases the latency in the main thread. But in general, operating systems are preemptive and are capable of switching back and forth between threads, so you shouldn't worry about 'pausing'.

Related

Is it possible to implement a feature like Java's CompletableFuture::complete in Rust?

I'm a beginner in rust, and I'm trying to use rust's asynchronous programming.
In my requirement scenario, I want to create a empty Future and complete it in another thread after a complex multi-round scheduling process. The CompletableFuture::complete of Java can meet my needs very well.
I have tried to find an implementation of Rust, but haven't found one yet.
Is it possible to do it in Rust?
I understand from the comments below that using a channel for this is more in line with rust's design.
My scenario is a hierarchical scheduling executor.
For example, Task1 will be splitted to several Drivers, each Driver will use multi thread(rayon threadpool) to do some computation work, and the former driver's state change will trigger the execution of next driver, the result of the whole task is the last driver's output and the intermedia drivers have no output. That is to say, my async function cannot get result from one spawn task directly, so I need a shared stack variable or a channel to transfer the result.
So what I really want is this: the last driver which is executed in a rayon thread, it can get a channel's tx by it's identify without storing it (to simplify the state change process).
I found the tx and rx of oneshot cannot be copies and they are not thread safe, and the send method of tx need ownership. So, I can't store the tx in main thread and let the last driver find it's tx by identify. But I can use mpsc to do that, I worte 2 demos and pasted it into the body of the question, but I have to create mpsc with capacity 1 and close it manually.
I wrote 2 demos, as bellow.I wonder if this is an appropriate and efficient use of mpsc?
Version implemented using oneshot, cannot work.
#[tokio::test]
pub async fn test_async() -> Result<()>{
let mut executor = Executor::new();
let res1 = executor.run(1).await?;
let res2 = executor.run(2).await?;
println!("res1 {}, res2 {}", res1, res2);
Ok(())
}
struct Executor {
pub pool: ThreadPool,
pub txs: Arc<DashMap<i32, RwLock<oneshot::Sender<i32>>>>,
}
impl Executor {
pub fn new() -> Self {
Executor{
pool: ThreadPoolBuilder::new().num_threads(10).build().unwrap(),
txs: Arc::new(DashMap::new()),
}
}
pub async fn run(&mut self, index: i32) -> Result<i32> {
let (tx, rx) = oneshot::channel();
self.txs.insert(index, RwLock::new(tx));
let txs_clone = self.txs.clone();
self.pool.spawn(move || {
let spawn_tx = txs_clone.get(&index).unwrap();
let guard = block_on(spawn_tx.read());
// cannot work, send need ownership, it will cause move of self
guard.send(index);
});
let res = rx.await;
return Ok(res.unwrap());
}
}
Version implemented using mpsc, can work, not sure about performance
#[tokio::test]
pub async fn test_async() -> Result<()>{
let mut executor = Executor::new();
let res1 = executor.run(1).await?;
let res2 = executor.run(2).await?;
println!("res1 {}, res2 {}", res1, res2);
// close channel after task finished
executor.close(1);
executor.close(2);
Ok(())
}
struct Executor {
pub pool: ThreadPool,
pub txs: Arc<DashMap<i32, RwLock<mpsc::Sender<i32>>>>,
}
impl Executor {
pub fn new() -> Self {
Executor{
pool: ThreadPoolBuilder::new().num_threads(10).build().unwrap(),
txs: Arc::new(DashMap::new()),
}
}
pub fn close(&mut self, index:i32) {
self.txs.remove(&index);
}
pub async fn run(&mut self, index: i32) -> Result<i32> {
let (tx, mut rx) = mpsc::channel(1);
self.txs.insert(index, RwLock::new(tx));
let txs_clone = self.txs.clone();
self.pool.spawn(move || {
let spawn_tx = txs_clone.get(&index).unwrap();
let guard = block_on(spawn_tx.value().read());
block_on(guard.deref().send(index));
});
// 0 mock invalid value
let mut res:i32 = 0;
while let Some(data) = rx.recv().await {
println!("recv data {}", data);
res = data;
break;
}
return Ok(res);
}
}
Disclaimer: It's really hard to picture what you are attempting to achieve, because the examples provided are trivial to solve, with no justification for the added complexity (DashMap). As such, this answer will be progressive, though it will remain focused on solving the problem you demonstrated you had, and not necessarily the problem you're thinking of... as I have no crystal ball.
We'll be using the following Result type in the examples:
type Result<T> = Result<T, Box<dyn Error + Send + Sync + 'static>>;
Serial execution
The simplest way to execute a task, is to do so right here, right now.
impl Executor {
pub async fn run<F>(&self, task: F) -> Result<i32>
where
F: FnOnce() -> Future<Output = Result<i32>>,
{
task().await
}
}
Async execution - built-in
When the execution of a task may involve heavy-weight calculations, it may be beneficial to execute it on a background thread.
Whichever runtime you are using probably supports this functionality, I'll demonstrate with tokio:
impl Executor {
pub async fn run<F>(&self, task: F) -> Result<i32>
where
F: FnOnce() -> Result<i32>,
{
Ok(tokio::task::spawn_block(task).await??)
}
}
Async execution - one-shot
If you wish to have more control on the number of CPU-bound threads, either to limit them, or to partition the CPUs of the machine for different needs, then the async runtime may not be configurable enough and you may prefer to use a thread-pool instead.
In this case, synchronization back with the runtime can be achieved via channels, the simplest of which being the oneshot channel.
impl Executor {
pub async fn run<F>(&self, task: F) -> Result<i32>
where
F: FnOnce() -> Result<i32>,
{
let (tx, mut rx) = oneshot::channel();
self.pool.spawn(move || {
let result = task();
// Decide on how to handle the fact that nobody will read the result.
let _ = tx.send(result);
});
Ok(rx.await??)
}
}
Note that in all of the above solutions, task remains agnostic as to how it's executed. This is a property you should strive for, as it makes it easier to change the way execution is handled in the future by more neatly separating the two concepts.

Why does tokio::spawn have a delay when called next to crossbeam_channel::select?

I'm creating a task which will spawn other tasks. Some of them will take some time, so they cannot be awaited, but they can run in parallel:
src/main.rs
use crossbeam::crossbeam_channel::{bounded, select};
#[tokio::main]
async fn main() {
let (s, r) = bounded::<usize>(1);
tokio::spawn(async move {
let mut counter = 0;
loop {
let loop_id = counter.clone();
tokio::spawn(async move { // why this one was not fired?
println!("inner task {}", loop_id);
}); // .await.unwrap(); - solves issue, but this is long task which cannot be awaited
println!("loop {}", loop_id);
select! {
recv(r) -> rr => {
// match rr {
// Ok(ee) => {
// println!("received from channel {}", loop_id);
// tokio::spawn(async move {
// println!("received from channel task {}", loop_id);
// });
// },
// Err(e) => println!("{}", e),
// };
},
// more recv(some_channel) ->
}
counter = counter + 1;
}
});
// let s_clone = s.clone();
// tokio::spawn(async move {
// s_clone.send(2).unwrap();
// });
loop {
// rest of the program
}
}
I've noticed strange behavior. This outputs:
loop 0
I was expecting it to also output inner task 0.
If I send a value to channel, the output will be:
loop 0
inner task 0
loop 1
This is missing inner task 1.
Why is inner task spawned with one loop of delay?
The first time I noticed such behavior with 'received from channel task' delayed one loop, but when I reduced code to prepare sample this started to happen with 'inner task'. It might be worth mentioning that if I write second tokio::spawn right to another, only the last one will have this issue. Is there something I should be aware when calling tokio::spawn and select!? What causes this one loop of delay?
Cargo.toml dependencies
[dependencies]
tokio = { version = "0.2", features = ["full"] }
crossbeam = "0.7"
Rust 1.46, Windows 10
select! is blocking, and the docs for tokio::spawn say:
The spawned task may execute on the current thread, or it may be sent to a different thread to be executed.
In this case, the select! "future" is actually a blocking function, and spawn doesn't use a new thread (either in the first invocation or the one inside the loop).
Because you don't tell tokio that you are going to block, tokio doesn't think another thread is needed (from tokio's perspective, you only have 3 futures which should never block, so why would you need another thread anyway?).
The solution is to use the tokio::task::spawn_blocking for the select!-ing closure (which will no longer be a future, so async move {} is now move || {}).
Now tokio will know that this function actually blocks, and will move it to another thread (while keeping all the actual futures in other execution threads).
use crossbeam::crossbeam_channel::{bounded, select};
#[tokio::main]
async fn main() {
let (s, r) = bounded::<usize>(1);
tokio::task::spawn_blocking(move || {
// ...
});
loop {
// rest of the program
}
}
Link to playground
Another possible solution is to use a non-blocking channel like tokio::sync::mpsc, on which you can use await and get the expected behavior, like this playground example with direct recv().await or with tokio::select!, like this:
use tokio::sync::mpsc;
#[tokio::main]
async fn main() {
let (mut s, mut r) = mpsc::channel::<usize>(1);
tokio::spawn(async move {
loop {
// ...
tokio::select! {
Some(i) = r.recv() => {
println!("got = {}", i);
}
}
}
});
loop {
// rest of the program
}
}
Link to playground

How can I stop running synchronous code when the future wrapping it is dropped?

I have asynchronous code which calls synchronous code that takes a while to run, so I followed the suggestions outlined in What is the best approach to encapsulate blocking I/O in future-rs?. However, my asynchronous code has a timeout, after which I am no longer interested in the result of the synchronous calculation:
use std::{thread, time::Duration};
use tokio::{task, time}; // 0.2.10
// This takes 1 second
fn long_running_complicated_calculation() -> i32 {
let mut sum = 0;
for i in 0..10 {
thread::sleep(Duration::from_millis(100));
eprintln!("{}", i);
sum += i;
// Interruption point
}
sum
}
#[tokio::main]
async fn main() {
let handle = task::spawn_blocking(long_running_complicated_calculation);
let guarded = time::timeout(Duration::from_millis(250), handle);
match guarded.await {
Ok(s) => panic!("Sum was calculated: {:?}", s),
Err(_) => eprintln!("Sum timed out (expected)"),
}
}
Running this code shows that the timeout fires, but the synchronous code also continues to run:
0
1
Sum timed out (expected)
2
3
4
5
6
7
8
9
How can I stop running the synchronous code when the future wrapping it is dropped?
I don't expect that the compiler will magically be able to stop my synchronous code. I've annotated a line with "interruption point" where I'd be required to manually put some kind of check to exit early from my function, but I don't know how to easily get a notification that the result of spawn_blocking (or ThreadPool::spawn_with_handle, for pure futures-based code) has been dropped.
You can pass an atomic boolean which you then use to flag the task as needing cancellation. (I'm not sure I'm using an appropriate Ordering for the load/store calls, that probably needs some more consideration)
Here's a modified version of your code that outputs
0
1
Sum timed out (expected)
2
Interrupted...
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::Arc;
use std::{thread, time::Duration};
use tokio::{task, time}; // 0.2.10
// This takes 1 second
fn long_running_complicated_calculation(flag: &AtomicBool) -> i32 {
let mut sum = 0;
for i in 0..10 {
thread::sleep(Duration::from_millis(100));
eprintln!("{}", i);
sum += i;
// Interruption point
if !flag.load(Ordering::Relaxed) {
eprintln!("Interrupted...");
break;
}
}
sum
}
#[tokio::main]
async fn main() {
let some_bool = Arc::new(AtomicBool::new(true));
let some_bool_clone = some_bool.clone();
let handle =
task::spawn_blocking(move || long_running_complicated_calculation(&some_bool_clone));
let guarded = time::timeout(Duration::from_millis(250), handle);
match guarded.await {
Ok(s) => panic!("Sum was calculated: {:?}", s),
Err(_) => {
eprintln!("Sum timed out (expected)");
some_bool.store(false, Ordering::Relaxed);
}
}
}
playground
It's not really possible to get this to happen automatically on the dropping of the futures / handles with current Tokio. Some work towards this is being done in http://github.com/tokio-rs/tokio/issues/1830 and http://github.com/tokio-rs/tokio/issues/1879.
However, you can get something similar by wrapping the futures in a custom type.
Here's an example which looks almost the same as the original code, but with the addition of a simple wrapper type in a module. It would be even more ergonomic if I implemented Future<T> on the wrapper type that just forwards to the wrapped handle, but that was proving tiresome.
mod blocking_cancelable_task {
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::Arc;
use tokio::task;
pub struct BlockingCancelableTask<T> {
pub h: Option<tokio::task::JoinHandle<T>>,
flag: Arc<AtomicBool>,
}
impl<T> Drop for BlockingCancelableTask<T> {
fn drop(&mut self) {
eprintln!("Dropping...");
self.flag.store(false, Ordering::Relaxed);
}
}
impl<T> BlockingCancelableTask<T>
where
T: Send + 'static,
{
pub fn new<F>(f: F) -> BlockingCancelableTask<T>
where
F: FnOnce(&AtomicBool) -> T + Send + 'static,
{
let flag = Arc::new(AtomicBool::new(true));
let flag_clone = flag.clone();
let h = task::spawn_blocking(move || f(&flag_clone));
BlockingCancelableTask { h: Some(h), flag }
}
}
pub fn spawn<F, T>(f: F) -> BlockingCancelableTask<T>
where
T: Send + 'static,
F: FnOnce(&AtomicBool) -> T + Send + 'static,
{
BlockingCancelableTask::new(f)
}
}
use std::sync::atomic::{AtomicBool, Ordering};
use std::{thread, time::Duration};
use tokio::time; // 0.2.10
// This takes 1 second
fn long_running_complicated_calculation(flag: &AtomicBool) -> i32 {
let mut sum = 0;
for i in 0..10 {
thread::sleep(Duration::from_millis(100));
eprintln!("{}", i);
sum += i;
// Interruption point
if !flag.load(Ordering::Relaxed) {
eprintln!("Interrupted...");
break;
}
}
sum
}
#[tokio::main]
async fn main() {
let mut h = blocking_cancelable_task::spawn(long_running_complicated_calculation);
let guarded = time::timeout(Duration::from_millis(250), h.h.take().unwrap());
match guarded.await {
Ok(s) => panic!("Sum was calculated: {:?}", s),
Err(_) => {
eprintln!("Sum timed out (expected)");
}
}
}
playground

What happens to an async task when it is aborted?

Rust has async methods that can be tied to Abortable futures. The documentation says that, when aborted:
the future will complete immediately without making any further progress.
Will the variables owned by the task bound to the future be dropped? If those variables implement drop, will drop be called? If the future has spawned other futures, will all of them be aborted in a chain?
E.g.: In the following snippet, I don't see the destructor happening for the aborted task, but I don't know if it is not called or happens in a separate thread where the print is not shown.
use futures::executor::block_on;
use futures::future::{AbortHandle, Abortable};
struct S {
i: i32,
}
impl Drop for S {
fn drop(&mut self) {
println!("dropping S");
}
}
async fn f() -> i32 {
let s = S { i: 42 };
std::thread::sleep(std::time::Duration::from_secs(2));
s.i
}
fn main() {
println!("first test...");
let (abort_handle, abort_registration) = AbortHandle::new_pair();
let _ = Abortable::new(f(), abort_registration);
abort_handle.abort();
std::thread::sleep(std::time::Duration::from_secs(1));
println!("second test...");
let (_, abort_registration) = AbortHandle::new_pair();
let task = Abortable::new(f(), abort_registration);
block_on(task).unwrap();
std::thread::sleep(std::time::Duration::from_secs(1));
}
playground
Yes, values that have been created will be dropped.
In your first example, the future returned by f is never started, so the S is never created. This means that it cannot be dropped.
In the second example, the value is dropped.
This is more obvious if you both run the future and abort it. Here, I spawn two concurrent futures:
create an S and waits 200ms
wait 100ms and abort future #1
use futures::future::{self, AbortHandle, Abortable};
use std::time::Duration;
use tokio::time;
struct S {
i: i32,
}
impl S {
fn new(i: i32) -> Self {
println!("Creating S {}", i);
S { i }
}
}
impl Drop for S {
fn drop(&mut self) {
println!("Dropping S {}", self.i);
}
}
#[tokio::main]
async fn main() {
let create_s = async {
let s = S::new(42);
time::delay_for(Duration::from_millis(200)).await;
println!("Creating {} done", s.i);
};
let (abort_handle, abort_registration) = AbortHandle::new_pair();
let create_s = Abortable::new(create_s, abort_registration);
let abort_s = async move {
time::delay_for(Duration::from_millis(100)).await;
abort_handle.abort();
};
let c = tokio::spawn(create_s);
let a = tokio::spawn(abort_s);
let (c, a) = future::join(c, a).await;
println!("{:?}, {:?}", c, a);
}
Creating S 42
Dropping S 42
Ok(Err(Aborted)), Ok(())
Note that I've switched to Tokio to be able to use time::delay_for, as you should never use blocking operations in an async function.
See also:
Why does Future::select choose the future with a longer sleep period first?
What is the best approach to encapsulate blocking I/O in future-rs?
If the future has spawned other futures, will all of them be aborted in a chain?
No, when you spawn a future, it is disconnected from where it was spawned.
See also:
What is the purpose of async/await in Rust?

Is there any way to create a async stream generator that yields the result of repeatedly calling a function?

I want to build a program that collects weather updates and represents them as a stream. I want to call get_weather() in an infinite loop, with 60 seconds delay between finish and start.
A simplified version would look like this:
async fn get_weather() -> Weather { /* ... */ }
fn get_weather_stream() -> impl futures::Stream<Item = Weather> {
loop {
tokio::timer::delay_for(std::time::Duration::from_secs(60)).await;
let weather = get_weather().await;
yield weather; // This is not supported
// Note: waiting for get_weather() stops the timer and avoids overflows.
}
}
Is there any way to do this easily?
Using tokio::timer::Interval will not work when get_weather() takes more than 60 seconds:
fn get_weather_stream() -> impl futures::Stream<Item = Weather> {
tokio::timer::Interval::new_with_delay(std::time::Duration::from_secs(60))
.then(|| get_weather())
}
If that happens, the next function will start immediately. I want to keep exactly 60 seconds between the previous get_weather() start and the next get_weather() start.
Use stream::unfold to go from the "world of futures" to the "world of streams". We don't need any extra state, so we use the empty tuple:
use futures::StreamExt; // 0.3.4
use std::time::Duration;
use tokio::time; // 0.2.11
struct Weather;
async fn get_weather() -> Weather {
Weather
}
const BETWEEN: Duration = Duration::from_secs(1);
fn get_weather_stream() -> impl futures::Stream<Item = Weather> {
futures::stream::unfold((), |_| async {
time::delay_for(BETWEEN).await;
let weather = get_weather().await;
Some((weather, ()))
})
}
#[tokio::main]
async fn main() {
get_weather_stream()
.take(3)
.for_each(|_v| async {
println!("Got the weather");
})
.await;
}
% time ./target/debug/example
Got the weather
Got the weather
Got the weather
real 3.085 3085495us
user 0.004 3928us
sys 0.003 3151us
See also:
How do I convert an iterator into a stream on success or an empty stream on failure?
How do I iterate over a Vec of functions returning Futures in Rust?
Creating a stream of values while calling async fns?

Resources