Is there any way to create a async stream generator that yields the result of repeatedly calling a function? - asynchronous

I want to build a program that collects weather updates and represents them as a stream. I want to call get_weather() in an infinite loop, with 60 seconds delay between finish and start.
A simplified version would look like this:
async fn get_weather() -> Weather { /* ... */ }
fn get_weather_stream() -> impl futures::Stream<Item = Weather> {
loop {
tokio::timer::delay_for(std::time::Duration::from_secs(60)).await;
let weather = get_weather().await;
yield weather; // This is not supported
// Note: waiting for get_weather() stops the timer and avoids overflows.
}
}
Is there any way to do this easily?
Using tokio::timer::Interval will not work when get_weather() takes more than 60 seconds:
fn get_weather_stream() -> impl futures::Stream<Item = Weather> {
tokio::timer::Interval::new_with_delay(std::time::Duration::from_secs(60))
.then(|| get_weather())
}
If that happens, the next function will start immediately. I want to keep exactly 60 seconds between the previous get_weather() start and the next get_weather() start.

Use stream::unfold to go from the "world of futures" to the "world of streams". We don't need any extra state, so we use the empty tuple:
use futures::StreamExt; // 0.3.4
use std::time::Duration;
use tokio::time; // 0.2.11
struct Weather;
async fn get_weather() -> Weather {
Weather
}
const BETWEEN: Duration = Duration::from_secs(1);
fn get_weather_stream() -> impl futures::Stream<Item = Weather> {
futures::stream::unfold((), |_| async {
time::delay_for(BETWEEN).await;
let weather = get_weather().await;
Some((weather, ()))
})
}
#[tokio::main]
async fn main() {
get_weather_stream()
.take(3)
.for_each(|_v| async {
println!("Got the weather");
})
.await;
}
% time ./target/debug/example
Got the weather
Got the weather
Got the weather
real 3.085 3085495us
user 0.004 3928us
sys 0.003 3151us
See also:
How do I convert an iterator into a stream on success or an empty stream on failure?
How do I iterate over a Vec of functions returning Futures in Rust?
Creating a stream of values while calling async fns?

Related

Have a rust function pause execution and return a value multiple times in without losing its stack

I am working on a project that has long compute times to which I will have hundreds of nodes running it, as part of my implementation I have a status handler object/struct which talks to the API and gets the needed information like the arguments, the status handler then calls the main intensive function.
It order to keep tabs on the computationally intensive function I would like it to yield back to the status handler function with the completed percentage so the status handler can update the API and then allow the intensive function to continue computation without losing any of its stack (such as it variables and file handles)
I've looked into async function but they seem to only return once.
Thank you in advance!
What you are asking for are generators, which are currently not stable. You can either experiment with them on nightly, or manually create something that works like them and manually call it (though it won't has as nice syntax as generators). Something like this for example:
enum Status<T, K> {
Updated(T),
Finished(K)
}
struct State { /* ... */ }
impl State {
pub fn new() -> Self {
Self { /* ... */ }
}
pub fn call(self) -> Status<Self, ()> {
// Do some update on State and return
// State::Updated(self). When you finised
// return State::Finished(()). Note that this
// method consumes self. You could make it take
// &mut self, but then you would have to worry
// about how to prevent it beeing called after
// it finished. If you want to return some intermidiate
// value in each step you can make Status::Updated
// contain a (state, value) instead.
todo!()
}
}
fn foo() {
let mut state = State::new();
let result = loop {
match state.call() {
Status::Updated(s) => state = s,
Status::Finished(result) => break result
}
};
}
Async can in fact pause and resume, but it is meant for IO bound programs that basically wait for some external IO the entire time. It's not meant for computation heavy tasks.
There two ways that come to my mind on how to solve this problem:
threads & channels
callbacks
Solution 1: Threads & Channels
use std::{sync::mpsc, thread, time::Duration};
struct StatusUpdate {
task_id: i32,
percent: f32,
}
impl StatusUpdate {
pub fn new(task_id: i32, percent: f32) -> Self {
Self { task_id, percent }
}
}
fn expensive_computation(id: i32, status_update: mpsc::Sender<StatusUpdate>) {
status_update.send(StatusUpdate::new(id, 0.0)).unwrap();
thread::sleep(Duration::from_millis(1000));
status_update.send(StatusUpdate::new(id, 33.3)).unwrap();
thread::sleep(Duration::from_millis(1000));
status_update.send(StatusUpdate::new(id, 66.6)).unwrap();
thread::sleep(Duration::from_millis(1000));
status_update.send(StatusUpdate::new(id, 100.0)).unwrap();
}
fn main() {
let (status_sender_1, status_receiver) = mpsc::channel();
let status_sender_2 = status_sender_1.clone();
thread::spawn(move || expensive_computation(1, status_sender_1));
thread::spawn(move || expensive_computation(2, status_sender_2));
for status_update in status_receiver {
println!(
"Task {} is done {} %",
status_update.task_id, status_update.percent
);
}
}
Task 1 is done 0 %
Task 2 is done 0 %
Task 1 is done 33.3 %
Task 2 is done 33.3 %
Task 1 is done 66.6 %
Task 2 is done 66.6 %
Task 2 is done 100 %
Task 1 is done 100 %
Solution 2: Callbacks
use std::{thread, time::Duration};
struct StatusUpdate {
task_id: i32,
percent: f32,
}
impl StatusUpdate {
pub fn new(task_id: i32, percent: f32) -> Self {
Self { task_id, percent }
}
}
fn expensive_computation<F: FnMut(StatusUpdate)>(id: i32, mut update_status: F) {
update_status(StatusUpdate::new(id, 0.0));
thread::sleep(Duration::from_millis(1000));
update_status(StatusUpdate::new(id, 33.3));
thread::sleep(Duration::from_millis(1000));
update_status(StatusUpdate::new(id, 66.6));
thread::sleep(Duration::from_millis(1000));
update_status(StatusUpdate::new(id, 100.0));
}
fn main() {
expensive_computation(1, |status_update| {
println!(
"Task {} is done {} %",
status_update.task_id, status_update.percent
);
});
}
Task 1 is done 0 %
Task 1 is done 33.3 %
Task 1 is done 66.6 %
Task 1 is done 100 %
Note that with the channels solution it is much easier to handle multiple computations on different threads at once. With callbacks, communicating between threads is hard/impossible.
am I able to pause execution of the expensive function while I do something with that data then allow it to resume?
No, this is not something that can be done with threads. In general.
You can run them with a lower priority than the main thread, which means they won't be scheduled as aggressively, which decreases the latency in the main thread. But in general, operating systems are preemptive and are capable of switching back and forth between threads, so you shouldn't worry about 'pausing'.

How to integrate async data collection with threadpool data processing in Rust

I'd like to improve the integration of my async data collection with my rayon data processing by overlapping the retrieval and the processing. Currently, I pull lots of pages from a web site using normal async code. Once that is complete, I do the cpu-intensive work using rayon's par_iter.
It seems like I should be able to easily overlap the processing, so that I'm not waiting for every last page before I begin the grunt work. Every page that I retrieve is independent of the others, so there is no need to wait before the conversion.
Here's what I have working currently (simplified just a bit):
use rayon::prelude::*;
use futures::{stream, StreamExt};
use reqwest::{Client, Result};
const CONCURRENT_REQUESTS: usize = usize::MAX;
const MAX_PAGE: usize = 1000;
#[tokio::main]
async fn main() {
// get data from server
let client = Client::new();
let bodies: Vec<Result<String>> = stream::iter(1..MAX_PAGE+1)
.map(|page_number| {
let client = &client;
async move {
client
.get(format!("https://someurl?{page_number}"))
.send()
.await?
.text()
.await
}
})
.buffer_unordered(CONCURRENT_REQUESTS)
.collect()
.await;
// transform the data
let mut rows: Vec<MyRow> = bodies
.par_iter()
.filter_map(|body| body.as_ref().ok())
.map(|data| {
let page = serde_json::from_str::<MyPage>(data).unwrap();
page.rows
.iter()
.map(|x| Row::new(x))
.collect::<Vec<MyRow>>()
})
.flatten()
.collect();
// do something with rows
}

Rust: async is not concurent

Here's the example from the Rust book.
async fn learn_and_sing() {
// Wait until the song has been learned before singing it.
// We use `.await` here rather than `block_on` to prevent blocking the
// thread, which makes it possible to `dance` at the same time.
let song = learn_song().await;
sing_song(song).await;
}
async fn async_main() {
let f1 = learn_and_sing();
let f2 = dance();
// `join!` is like `.await` but can wait for multiple futures concurrently.
// If we're temporarily blocked in the `learn_and_sing` future, the `dance`
// future will take over the current thread. If `dance` becomes blocked,
// `learn_and_sing` can take back over. If both futures are blocked, then
// `async_main` is blocked and will yield to the executor.
futures::join!(f1, f2);
}
fn main() {
block_on(async_main());
}
And it's says
In this example, learning the song must happen before singing the song, but both learning and singing can happen at the same time as dancing.
But I can't get this point. I wrote a short code in Rust
async fn learn_song() -> &'static str {
println!("learn_song");
"some song"
}
#[allow(unused_variables)]
async fn sing_song(song: &str) {
println!("sing_song");
}
async fn dance() {
println!("dance");
}
async fn learn_and_sing() {
let song = learn_song().await;
std::thread::sleep(std::time::Duration::from_secs(1));
sing_song(song).await;
}
async fn async_main() {
let f1 = learn_and_sing();
let f2 = dance();
let f3 = learn_and_sing();
futures::join!(f1, f2, f3);
}
fn main() {
futures::executor::block_on(async_main());
}
And it seems like all the async functions in the async_main executed synchronously.
The output is
learn_song
sing_song
dance
learn_song
sing_song
If they run asynchronously, I would expect to get something like this in my output
learn_song
dance
learn_song
sing_song
sing_song
If I add an extra call of learn_and_sing it would steel be printed like in a synchronous function.
The question Why so? Is it possible to make a real async using only async/.await and no threads?
Like tkausl's comment states, std::thread::sleep makes the whole thread sleep, which prevents any code on the thread from executing during the sleeping duration. You could use async_std::task::sleep in this situation, as it is an asynchronous version of the sleep function.
async fn learn_song() -> &'static str {
println!("learn_song");
"some song"
}
#[allow(unused_variables)]
async fn sing_song(song: &str) {
println!("sing_song");
}
async fn dance() {
println!("dance");
}
async fn learn_and_sing() {
let song = learn_song().await;
async_std::task::sleep(std::time::Duration::from_secs(1)).await;
sing_song(song).await;
}
#[async_std::main]
async fn main() {
let f1 = learn_and_sing();
let f2 = dance();
let f3 = learn_and_sing();
futures::join!(f1, f2, f3);
}

How to run multiple Tokio async tasks in a loop without using tokio::spawn?

I built a LED clock that also displays weather. My program does a couple of different things in a loop, each thing with a different interval:
updates the LEDs every 50ms,
checks the light level (to adjust the brightness) every 1 second,
fetches weather every 10 minutes,
actually some more, but that's irrelevant.
Updating the LEDs is the most critical: I don't want this to be delayed when e.g. weather is being fetched. This should not be a problem as fetching weather is mostly an async HTTP call.
Here's the code that I have:
let mut measure_light_stream = tokio::time::interval(Duration::from_secs(1));
let mut update_weather_stream = tokio::time::interval(WEATHER_FETCH_INTERVAL);
let mut update_leds_stream = tokio::time::interval(UPDATE_LEDS_INTERVAL);
loop {
tokio::select! {
_ = measure_light_stream.tick() => {
let light = lm.get_light();
light_smooth.sp = light;
},
_ = update_weather_stream.tick() => {
let fetched_weather = weather_service.get(&config).await;
// Store the fetched weather for later access from the displaying function.
weather_clock.weather = fetched_weather.clone();
},
_ = update_leds_stream.tick() => {
// Some code here that actually sets the LEDs.
// This code accesses the weather_clock, the light level etc.
},
}
}
I realised the code doesn't do what I wanted it to do - fetching the weather blocks the execution of the loop. I see why - the docs of tokio::select! say the other branches are cancelled as soon as the update_weather_stream.tick() expression completes.
How do I do this in such a way that while fetching the weather is waiting on network, the LEDs are still updated? I figured out I could use tokio::spawn to start a separate non-blocking "thread" for fetching weather, but then I have problems with weather_service not being Send, let alone weather_clock not being shareable between threads. I don't want this complication, I'm fine with everything running in a single thread, just like what select! does.
Reproducible example
use std::time::Duration;
use tokio::time::{interval, sleep};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut slow_stream = interval(Duration::from_secs(3));
let mut fast_stream = interval(Duration::from_millis(200));
// Note how access to this data is straightforward, I do not want
// this to get more complicated, e.g. care about threads and Send.
let mut val = 1;
loop {
tokio::select! {
_ = fast_stream.tick() => {
println!(".{}", val);
},
_ = slow_stream.tick() => {
println!("Starting slow operation...");
// The problem: During this await the dots are not printed.
sleep(Duration::from_secs(1)).await;
val += 1;
println!("...done");
},
}
}
}
You can use tokio::join! to run multiple async operations concurrently within the same task.
Here's an example:
async fn measure_light(halt: &Cell<bool>) {
while !halt.get() {
let light = lm.get_light();
// ....
tokio::time::sleep(Duration::from_secs(1)).await;
}
}
async fn blink_led(halt: &Cell<bool>) {
while !halt.get() {
// LED blinking code
tokio::time::sleep(UPDATE_LEDS_INTERVAL).await;
}
}
async fn poll_weather(halt: &Cell<bool>) {
while !halt.get() {
let weather = weather_service.get(&config).await;
// ...
tokio::time::sleep(WEATHER_FETCH_INTERVAL).await;
}
}
// example on how to terminate execution
async fn terminate(halt: &Cell<bool>) {
tokio::time::sleep(Duration::from_secs(10)).await;
halt.set(true);
}
async fn main() {
let halt = Cell::new(false);
tokio::join!(
measure_light(&halt),
blink_led(&halt),
poll_weather(&halt),
terminate(&halt),
);
}
If you're using tokio::TcpStream or other non-blocking IO, then it should allow for concurrent execution.
I've added a Cell flag for halting execution as an example. You can use the same technique to share any mutable state between join branches.
EDIT: Same thing can be done with tokio::select!. The main difference with your code is that the actual "business logic" is inside the futures awaited by select.
select allows you to drop unfinished futures instead of waiting for them to exit on their own (so halt termination flag is not necessary).
async fn main() {
tokio::select! {
_ = measure_light() => {},
_ = blink_led() = {},
_ = poll_weather() => {},
}
}
Here's a concrete solution, based on the second part of stepan's answer:
use std::time::Duration;
use tokio::time::sleep;
#[tokio::main]
async fn main() {
// Cell is an acceptable complication when accessing the data.
let val = std::cell::Cell::new(1);
tokio::select! {
_ = async {loop {
println!(".{}", val.get());
sleep(Duration::from_millis(200)).await;
}} => {},
_ = async {loop {
println!("Starting slow operation...");
// The problem: During this await the dots are not printed.
sleep(Duration::from_secs(1)).await;
val.set(val.get() + 1);
println!("...done");
sleep(Duration::from_secs(3)).await;
}} => {},
}
}
Playground link

What happens to an async task when it is aborted?

Rust has async methods that can be tied to Abortable futures. The documentation says that, when aborted:
the future will complete immediately without making any further progress.
Will the variables owned by the task bound to the future be dropped? If those variables implement drop, will drop be called? If the future has spawned other futures, will all of them be aborted in a chain?
E.g.: In the following snippet, I don't see the destructor happening for the aborted task, but I don't know if it is not called or happens in a separate thread where the print is not shown.
use futures::executor::block_on;
use futures::future::{AbortHandle, Abortable};
struct S {
i: i32,
}
impl Drop for S {
fn drop(&mut self) {
println!("dropping S");
}
}
async fn f() -> i32 {
let s = S { i: 42 };
std::thread::sleep(std::time::Duration::from_secs(2));
s.i
}
fn main() {
println!("first test...");
let (abort_handle, abort_registration) = AbortHandle::new_pair();
let _ = Abortable::new(f(), abort_registration);
abort_handle.abort();
std::thread::sleep(std::time::Duration::from_secs(1));
println!("second test...");
let (_, abort_registration) = AbortHandle::new_pair();
let task = Abortable::new(f(), abort_registration);
block_on(task).unwrap();
std::thread::sleep(std::time::Duration::from_secs(1));
}
playground
Yes, values that have been created will be dropped.
In your first example, the future returned by f is never started, so the S is never created. This means that it cannot be dropped.
In the second example, the value is dropped.
This is more obvious if you both run the future and abort it. Here, I spawn two concurrent futures:
create an S and waits 200ms
wait 100ms and abort future #1
use futures::future::{self, AbortHandle, Abortable};
use std::time::Duration;
use tokio::time;
struct S {
i: i32,
}
impl S {
fn new(i: i32) -> Self {
println!("Creating S {}", i);
S { i }
}
}
impl Drop for S {
fn drop(&mut self) {
println!("Dropping S {}", self.i);
}
}
#[tokio::main]
async fn main() {
let create_s = async {
let s = S::new(42);
time::delay_for(Duration::from_millis(200)).await;
println!("Creating {} done", s.i);
};
let (abort_handle, abort_registration) = AbortHandle::new_pair();
let create_s = Abortable::new(create_s, abort_registration);
let abort_s = async move {
time::delay_for(Duration::from_millis(100)).await;
abort_handle.abort();
};
let c = tokio::spawn(create_s);
let a = tokio::spawn(abort_s);
let (c, a) = future::join(c, a).await;
println!("{:?}, {:?}", c, a);
}
Creating S 42
Dropping S 42
Ok(Err(Aborted)), Ok(())
Note that I've switched to Tokio to be able to use time::delay_for, as you should never use blocking operations in an async function.
See also:
Why does Future::select choose the future with a longer sleep period first?
What is the best approach to encapsulate blocking I/O in future-rs?
If the future has spawned other futures, will all of them be aborted in a chain?
No, when you spawn a future, it is disconnected from where it was spawned.
See also:
What is the purpose of async/await in Rust?

Resources