Rust tokio alternative to fold and map to run a function concurrently with different inputs - asynchronous

I need a way to run the same function many times with different inputs.
And since the function depends on a slow web API, I need to run it concurrently and collect the results in one variable.
I use the following:
use tokio_stream::StreamExt;
async fn run(input: &str) -> Vec<String> {
vec![String::from(input), String::from(input)]
}
async fn main() {
let mut input = tokio_stream::iter(vec!["1","2","3","4","5","6","7","8"]);
let mut handles = vec![];
while let Some(domain) = input.next().await {
handles.push(run(domain));
}
let mut results = vec![];
let mut handles = tokio_stream::iter(handles);
while let Some(handle) = handles.next().await {
results.extend(handle.await);
}
}
I know there is a way with the futures crate, but I don't know if I can use it with tokio. Also tokio_stream::StreamExt contains fold and map methods but I can't find a way to use them without calling await.
What is the best way to do this?

IIUC what you want, you can use tokio::spawn to launch your tasks in the background and futures::join_all to wait until they have all completed. E.g. something like this (untested):
async fn run(input: &str) -> Vec<String> {
vec![String::from(input), String::from(input)]
}
async fn main() {
let input = vec!["1","2","3","4","5","6","7","8"];
let handles = input.iter().map (|domain| {
tokio::spawn (async move { run (domain).await })
});
let results = futures::join_all (handles).await;
}

Related

Is it possible to implement a feature like Java's CompletableFuture::complete in Rust?

I'm a beginner in rust, and I'm trying to use rust's asynchronous programming.
In my requirement scenario, I want to create a empty Future and complete it in another thread after a complex multi-round scheduling process. The CompletableFuture::complete of Java can meet my needs very well.
I have tried to find an implementation of Rust, but haven't found one yet.
Is it possible to do it in Rust?
I understand from the comments below that using a channel for this is more in line with rust's design.
My scenario is a hierarchical scheduling executor.
For example, Task1 will be splitted to several Drivers, each Driver will use multi thread(rayon threadpool) to do some computation work, and the former driver's state change will trigger the execution of next driver, the result of the whole task is the last driver's output and the intermedia drivers have no output. That is to say, my async function cannot get result from one spawn task directly, so I need a shared stack variable or a channel to transfer the result.
So what I really want is this: the last driver which is executed in a rayon thread, it can get a channel's tx by it's identify without storing it (to simplify the state change process).
I found the tx and rx of oneshot cannot be copies and they are not thread safe, and the send method of tx need ownership. So, I can't store the tx in main thread and let the last driver find it's tx by identify. But I can use mpsc to do that, I worte 2 demos and pasted it into the body of the question, but I have to create mpsc with capacity 1 and close it manually.
I wrote 2 demos, as bellow.I wonder if this is an appropriate and efficient use of mpsc?
Version implemented using oneshot, cannot work.
#[tokio::test]
pub async fn test_async() -> Result<()>{
let mut executor = Executor::new();
let res1 = executor.run(1).await?;
let res2 = executor.run(2).await?;
println!("res1 {}, res2 {}", res1, res2);
Ok(())
}
struct Executor {
pub pool: ThreadPool,
pub txs: Arc<DashMap<i32, RwLock<oneshot::Sender<i32>>>>,
}
impl Executor {
pub fn new() -> Self {
Executor{
pool: ThreadPoolBuilder::new().num_threads(10).build().unwrap(),
txs: Arc::new(DashMap::new()),
}
}
pub async fn run(&mut self, index: i32) -> Result<i32> {
let (tx, rx) = oneshot::channel();
self.txs.insert(index, RwLock::new(tx));
let txs_clone = self.txs.clone();
self.pool.spawn(move || {
let spawn_tx = txs_clone.get(&index).unwrap();
let guard = block_on(spawn_tx.read());
// cannot work, send need ownership, it will cause move of self
guard.send(index);
});
let res = rx.await;
return Ok(res.unwrap());
}
}
Version implemented using mpsc, can work, not sure about performance
#[tokio::test]
pub async fn test_async() -> Result<()>{
let mut executor = Executor::new();
let res1 = executor.run(1).await?;
let res2 = executor.run(2).await?;
println!("res1 {}, res2 {}", res1, res2);
// close channel after task finished
executor.close(1);
executor.close(2);
Ok(())
}
struct Executor {
pub pool: ThreadPool,
pub txs: Arc<DashMap<i32, RwLock<mpsc::Sender<i32>>>>,
}
impl Executor {
pub fn new() -> Self {
Executor{
pool: ThreadPoolBuilder::new().num_threads(10).build().unwrap(),
txs: Arc::new(DashMap::new()),
}
}
pub fn close(&mut self, index:i32) {
self.txs.remove(&index);
}
pub async fn run(&mut self, index: i32) -> Result<i32> {
let (tx, mut rx) = mpsc::channel(1);
self.txs.insert(index, RwLock::new(tx));
let txs_clone = self.txs.clone();
self.pool.spawn(move || {
let spawn_tx = txs_clone.get(&index).unwrap();
let guard = block_on(spawn_tx.value().read());
block_on(guard.deref().send(index));
});
// 0 mock invalid value
let mut res:i32 = 0;
while let Some(data) = rx.recv().await {
println!("recv data {}", data);
res = data;
break;
}
return Ok(res);
}
}
Disclaimer: It's really hard to picture what you are attempting to achieve, because the examples provided are trivial to solve, with no justification for the added complexity (DashMap). As such, this answer will be progressive, though it will remain focused on solving the problem you demonstrated you had, and not necessarily the problem you're thinking of... as I have no crystal ball.
We'll be using the following Result type in the examples:
type Result<T> = Result<T, Box<dyn Error + Send + Sync + 'static>>;
Serial execution
The simplest way to execute a task, is to do so right here, right now.
impl Executor {
pub async fn run<F>(&self, task: F) -> Result<i32>
where
F: FnOnce() -> Future<Output = Result<i32>>,
{
task().await
}
}
Async execution - built-in
When the execution of a task may involve heavy-weight calculations, it may be beneficial to execute it on a background thread.
Whichever runtime you are using probably supports this functionality, I'll demonstrate with tokio:
impl Executor {
pub async fn run<F>(&self, task: F) -> Result<i32>
where
F: FnOnce() -> Result<i32>,
{
Ok(tokio::task::spawn_block(task).await??)
}
}
Async execution - one-shot
If you wish to have more control on the number of CPU-bound threads, either to limit them, or to partition the CPUs of the machine for different needs, then the async runtime may not be configurable enough and you may prefer to use a thread-pool instead.
In this case, synchronization back with the runtime can be achieved via channels, the simplest of which being the oneshot channel.
impl Executor {
pub async fn run<F>(&self, task: F) -> Result<i32>
where
F: FnOnce() -> Result<i32>,
{
let (tx, mut rx) = oneshot::channel();
self.pool.spawn(move || {
let result = task();
// Decide on how to handle the fact that nobody will read the result.
let _ = tx.send(result);
});
Ok(rx.await??)
}
}
Note that in all of the above solutions, task remains agnostic as to how it's executed. This is a property you should strive for, as it makes it easier to change the way execution is handled in the future by more neatly separating the two concepts.

How to integrate async data collection with threadpool data processing in Rust

I'd like to improve the integration of my async data collection with my rayon data processing by overlapping the retrieval and the processing. Currently, I pull lots of pages from a web site using normal async code. Once that is complete, I do the cpu-intensive work using rayon's par_iter.
It seems like I should be able to easily overlap the processing, so that I'm not waiting for every last page before I begin the grunt work. Every page that I retrieve is independent of the others, so there is no need to wait before the conversion.
Here's what I have working currently (simplified just a bit):
use rayon::prelude::*;
use futures::{stream, StreamExt};
use reqwest::{Client, Result};
const CONCURRENT_REQUESTS: usize = usize::MAX;
const MAX_PAGE: usize = 1000;
#[tokio::main]
async fn main() {
// get data from server
let client = Client::new();
let bodies: Vec<Result<String>> = stream::iter(1..MAX_PAGE+1)
.map(|page_number| {
let client = &client;
async move {
client
.get(format!("https://someurl?{page_number}"))
.send()
.await?
.text()
.await
}
})
.buffer_unordered(CONCURRENT_REQUESTS)
.collect()
.await;
// transform the data
let mut rows: Vec<MyRow> = bodies
.par_iter()
.filter_map(|body| body.as_ref().ok())
.map(|data| {
let page = serde_json::from_str::<MyPage>(data).unwrap();
page.rows
.iter()
.map(|x| Row::new(x))
.collect::<Vec<MyRow>>()
})
.flatten()
.collect();
// do something with rows
}

Rust: async is not concurent

Here's the example from the Rust book.
async fn learn_and_sing() {
// Wait until the song has been learned before singing it.
// We use `.await` here rather than `block_on` to prevent blocking the
// thread, which makes it possible to `dance` at the same time.
let song = learn_song().await;
sing_song(song).await;
}
async fn async_main() {
let f1 = learn_and_sing();
let f2 = dance();
// `join!` is like `.await` but can wait for multiple futures concurrently.
// If we're temporarily blocked in the `learn_and_sing` future, the `dance`
// future will take over the current thread. If `dance` becomes blocked,
// `learn_and_sing` can take back over. If both futures are blocked, then
// `async_main` is blocked and will yield to the executor.
futures::join!(f1, f2);
}
fn main() {
block_on(async_main());
}
And it's says
In this example, learning the song must happen before singing the song, but both learning and singing can happen at the same time as dancing.
But I can't get this point. I wrote a short code in Rust
async fn learn_song() -> &'static str {
println!("learn_song");
"some song"
}
#[allow(unused_variables)]
async fn sing_song(song: &str) {
println!("sing_song");
}
async fn dance() {
println!("dance");
}
async fn learn_and_sing() {
let song = learn_song().await;
std::thread::sleep(std::time::Duration::from_secs(1));
sing_song(song).await;
}
async fn async_main() {
let f1 = learn_and_sing();
let f2 = dance();
let f3 = learn_and_sing();
futures::join!(f1, f2, f3);
}
fn main() {
futures::executor::block_on(async_main());
}
And it seems like all the async functions in the async_main executed synchronously.
The output is
learn_song
sing_song
dance
learn_song
sing_song
If they run asynchronously, I would expect to get something like this in my output
learn_song
dance
learn_song
sing_song
sing_song
If I add an extra call of learn_and_sing it would steel be printed like in a synchronous function.
The question Why so? Is it possible to make a real async using only async/.await and no threads?
Like tkausl's comment states, std::thread::sleep makes the whole thread sleep, which prevents any code on the thread from executing during the sleeping duration. You could use async_std::task::sleep in this situation, as it is an asynchronous version of the sleep function.
async fn learn_song() -> &'static str {
println!("learn_song");
"some song"
}
#[allow(unused_variables)]
async fn sing_song(song: &str) {
println!("sing_song");
}
async fn dance() {
println!("dance");
}
async fn learn_and_sing() {
let song = learn_song().await;
async_std::task::sleep(std::time::Duration::from_secs(1)).await;
sing_song(song).await;
}
#[async_std::main]
async fn main() {
let f1 = learn_and_sing();
let f2 = dance();
let f3 = learn_and_sing();
futures::join!(f1, f2, f3);
}

How to extract values from async functions to a non-async one? [duplicate]

I am trying to use hyper to grab the content of an HTML page and would like to synchronously return the output of a future. I realized I could have picked a better example since synchronous HTTP requests already exist, but I am more interested in understanding whether we could return a value from an async calculation.
extern crate futures;
extern crate hyper;
extern crate hyper_tls;
extern crate tokio;
use futures::{future, Future, Stream};
use hyper::Client;
use hyper::Uri;
use hyper_tls::HttpsConnector;
use std::str;
fn scrap() -> Result<String, String> {
let scraped_content = future::lazy(|| {
let https = HttpsConnector::new(4).unwrap();
let client = Client::builder().build::<_, hyper::Body>(https);
client
.get("https://hyper.rs".parse::<Uri>().unwrap())
.and_then(|res| {
res.into_body().concat2().and_then(|body| {
let s_body: String = str::from_utf8(&body).unwrap().to_string();
futures::future::ok(s_body)
})
}).map_err(|err| format!("Error scraping web page: {:?}", &err))
});
scraped_content.wait()
}
fn read() {
let scraped_content = future::lazy(|| {
let https = HttpsConnector::new(4).unwrap();
let client = Client::builder().build::<_, hyper::Body>(https);
client
.get("https://hyper.rs".parse::<Uri>().unwrap())
.and_then(|res| {
res.into_body().concat2().and_then(|body| {
let s_body: String = str::from_utf8(&body).unwrap().to_string();
println!("Reading body: {}", s_body);
Ok(())
})
}).map_err(|err| {
println!("Error reading webpage: {:?}", &err);
})
});
tokio::run(scraped_content);
}
fn main() {
read();
let content = scrap();
println!("Content = {:?}", &content);
}
The example compiles and the call to read() succeeds, but the call to scrap() panics with the following error message:
Content = Err("Error scraping web page: Error { kind: Execute, cause: None }")
I understand that I failed to launch the task properly before calling .wait() on the future but I couldn't find how to properly do it, assuming it's even possible.
Standard library futures
Let's use this as our minimal, reproducible example:
async fn example() -> i32 {
42
}
Call executor::block_on:
use futures::executor; // 0.3.1
fn main() {
let v = executor::block_on(example());
println!("{}", v);
}
Tokio
Use the tokio::main attribute on any function (not just main!) to convert it from an asynchronous function to a synchronous one:
use tokio; // 0.3.5
#[tokio::main]
async fn main() {
let v = example().await;
println!("{}", v);
}
tokio::main is a macro that transforms this
#[tokio::main]
async fn main() {}
Into this:
fn main() {
tokio::runtime::Builder::new_multi_thread()
.enable_all()
.build()
.unwrap()
.block_on(async { {} })
}
This uses Runtime::block_on under the hood, so you can also write this as:
use tokio::runtime::Runtime; // 0.3.5
fn main() {
let v = Runtime::new().unwrap().block_on(example());
println!("{}", v);
}
For tests, you can use tokio::test.
async-std
Use the async_std::main attribute on the main function to convert it from an asynchronous function to a synchronous one:
use async_std; // 1.6.5, features = ["attributes"]
#[async_std::main]
async fn main() {
let v = example().await;
println!("{}", v);
}
For tests, you can use async_std::test.
Futures 0.1
Let's use this as our minimal, reproducible example:
use futures::{future, Future}; // 0.1.27
fn example() -> impl Future<Item = i32, Error = ()> {
future::ok(42)
}
For simple cases, you only need to call wait:
fn main() {
let s = example().wait();
println!("{:?}", s);
}
However, this comes with a pretty severe warning:
This method is not appropriate to call on event loops or similar I/O situations because it will prevent the event loop from making progress (this blocks the thread). This method should only be called when it's guaranteed that the blocking work associated with this future will be completed by another thread.
Tokio
If you are using Tokio 0.1, you should use Tokio's Runtime::block_on:
use tokio; // 0.1.21
fn main() {
let mut runtime = tokio::runtime::Runtime::new().expect("Unable to create a runtime");
let s = runtime.block_on(example());
println!("{:?}", s);
}
If you peek in the implementation of block_on, it actually sends the future's result down a channel and then calls wait on that channel! This is fine because Tokio guarantees to run the future to completion.
See also:
How can I efficiently extract the first element of a futures::Stream in a blocking manner?
As this is the top result that come up in search engines by the query "How to call async from sync in Rust", I decided to share my solution here. I think it might be useful.
As #Shepmaster mentioned, back in version 0.1 futures crate had beautiful method .wait() that could be used to call an async function from a sync one. This must-have method, however, was removed from later versions of the crate.
Luckily, it's not that hard to re-implement it:
trait Block {
fn wait(self) -> <Self as futures::Future>::Output
where Self: Sized, Self: futures::Future
{
futures::executor::block_on(self)
}
}
impl<F,T> Block for F
where F: futures::Future<Output = T>
{}
After that, you can just do following:
async fn example() -> i32 {
42
}
fn main() {
let s = example().wait();
println!("{:?}", s);
}
Beware that this comes with all the caveats of original .wait() explained in the #Shepmaster's answer.
This works for me using tokio:
tokio::runtime::Runtime::new()?.block_on(fooAsyncFunction())?;

What happens to an async task when it is aborted?

Rust has async methods that can be tied to Abortable futures. The documentation says that, when aborted:
the future will complete immediately without making any further progress.
Will the variables owned by the task bound to the future be dropped? If those variables implement drop, will drop be called? If the future has spawned other futures, will all of them be aborted in a chain?
E.g.: In the following snippet, I don't see the destructor happening for the aborted task, but I don't know if it is not called or happens in a separate thread where the print is not shown.
use futures::executor::block_on;
use futures::future::{AbortHandle, Abortable};
struct S {
i: i32,
}
impl Drop for S {
fn drop(&mut self) {
println!("dropping S");
}
}
async fn f() -> i32 {
let s = S { i: 42 };
std::thread::sleep(std::time::Duration::from_secs(2));
s.i
}
fn main() {
println!("first test...");
let (abort_handle, abort_registration) = AbortHandle::new_pair();
let _ = Abortable::new(f(), abort_registration);
abort_handle.abort();
std::thread::sleep(std::time::Duration::from_secs(1));
println!("second test...");
let (_, abort_registration) = AbortHandle::new_pair();
let task = Abortable::new(f(), abort_registration);
block_on(task).unwrap();
std::thread::sleep(std::time::Duration::from_secs(1));
}
playground
Yes, values that have been created will be dropped.
In your first example, the future returned by f is never started, so the S is never created. This means that it cannot be dropped.
In the second example, the value is dropped.
This is more obvious if you both run the future and abort it. Here, I spawn two concurrent futures:
create an S and waits 200ms
wait 100ms and abort future #1
use futures::future::{self, AbortHandle, Abortable};
use std::time::Duration;
use tokio::time;
struct S {
i: i32,
}
impl S {
fn new(i: i32) -> Self {
println!("Creating S {}", i);
S { i }
}
}
impl Drop for S {
fn drop(&mut self) {
println!("Dropping S {}", self.i);
}
}
#[tokio::main]
async fn main() {
let create_s = async {
let s = S::new(42);
time::delay_for(Duration::from_millis(200)).await;
println!("Creating {} done", s.i);
};
let (abort_handle, abort_registration) = AbortHandle::new_pair();
let create_s = Abortable::new(create_s, abort_registration);
let abort_s = async move {
time::delay_for(Duration::from_millis(100)).await;
abort_handle.abort();
};
let c = tokio::spawn(create_s);
let a = tokio::spawn(abort_s);
let (c, a) = future::join(c, a).await;
println!("{:?}, {:?}", c, a);
}
Creating S 42
Dropping S 42
Ok(Err(Aborted)), Ok(())
Note that I've switched to Tokio to be able to use time::delay_for, as you should never use blocking operations in an async function.
See also:
Why does Future::select choose the future with a longer sleep period first?
What is the best approach to encapsulate blocking I/O in future-rs?
If the future has spawned other futures, will all of them be aborted in a chain?
No, when you spawn a future, it is disconnected from where it was spawned.
See also:
What is the purpose of async/await in Rust?

Resources