Why does holding a non-Send type across an await point result in a non-Send Future? - asynchronous

In the documentation for the Send trait, there is a nice example of how something like Rc is not Send, since cloning/dropping in two different threads can cause the reference count to get out of sync.
What is less clear, however, is why holding a binding to a non-Send type across an await point in an async fn causes the generated future to also be non-Send. I was able to find a work around for when the compiler has been too conservative in the work-arounds chapter of the async handbook, but it does not go as far as answering the question that I am asking here.
Perhaps someone could shed some light on this with an example of why having a non-Send type in a Future is ok, but holding it across an await is not?

When you use .await in an async function, the compiler builds a state machine behind the scenes. Each .await introduces a new state (while it waits for something) and the code in between are state transitions (aka tasks), which will be triggered based on some external event (e.g. from IO or a timer etc).
Each task gets scheduled to be executed by the async runtime, which could choose to use a different thread from the previous task. If the state transition is not safe to be sent between threads then the resulting Future is also not Send so that you get a compilation error if you try to execute it in a multi-threaded runtime.
It is completely OK for a Future not to be Send, it just means you can only execute it in a single-threaded runtime.
Perhaps someone could shed some light on this with an example of why having a non-Send type in a Future is ok, but holding it across an await is not?
Consider the following simple example:
async fn add_votes(current: Rc<Cell<i32>>, post: Url) {
let new_votes = get_votes(&post).await;
*current += new_votes;
}
The compiler will construct a state machine like this (simplified):
enum AddVotes {
Initial {
current: Rc<Cell<i32>>,
post: Url,
},
WaitingForGetVotes {
current: Rc<Cell<i32>>,
fut: GetVotesFut,
},
}
impl AddVotes {
fn new(current: Rc<Cell<i32>>, post: Url) {
AddVotes::Initial { current, post }
}
fn poll(&mut self) -> Poll {
match self {
AddVotes::Initial(state) => {
let fut = get_votes(&state.post);
*self = AddVotes::WaitingForGetVotes {
current: state.current,
fut
}
Poll::Pending
}
AddVotes::WaitingForGetVotes(state) => {
if let Poll::Ready(votes) = state.fut.poll() {
*state.current += votes;
Poll::Ready(())
} else {
Poll::Pending
}
}
}
}
}
In a multithreaded runtime, each call to poll could be from a different thread, in which case the runtime would move the AddVotes to the other thread before calling poll on it. This won't work because Rc cannot be sent between threads.
However, if the future just used an Rc within the same state transition, it would be fine, e.g. if votes was just an i32:
async fn add_votes(current: i32, post: Url) -> i32 {
let new_votes = get_votes(&post).await;
// use an Rc for some reason:
let rc = Rc::new(1);
println!("rc value: {:?}", rc);
current + new_votes
}
In which case, the state machine would look like this:
enum AddVotes {
Initial {
current: i32,
post: Url,
},
WaitingForGetVotes {
current: i32,
fut: GetVotesFut,
},
}
The Rc isn't captured in the state machine because it is created and dropped within the state transition (task), so the whole state machine (aka Future) is still Send.

Related

Is it possible to implement a feature like Java's CompletableFuture::complete in Rust?

I'm a beginner in rust, and I'm trying to use rust's asynchronous programming.
In my requirement scenario, I want to create a empty Future and complete it in another thread after a complex multi-round scheduling process. The CompletableFuture::complete of Java can meet my needs very well.
I have tried to find an implementation of Rust, but haven't found one yet.
Is it possible to do it in Rust?
I understand from the comments below that using a channel for this is more in line with rust's design.
My scenario is a hierarchical scheduling executor.
For example, Task1 will be splitted to several Drivers, each Driver will use multi thread(rayon threadpool) to do some computation work, and the former driver's state change will trigger the execution of next driver, the result of the whole task is the last driver's output and the intermedia drivers have no output. That is to say, my async function cannot get result from one spawn task directly, so I need a shared stack variable or a channel to transfer the result.
So what I really want is this: the last driver which is executed in a rayon thread, it can get a channel's tx by it's identify without storing it (to simplify the state change process).
I found the tx and rx of oneshot cannot be copies and they are not thread safe, and the send method of tx need ownership. So, I can't store the tx in main thread and let the last driver find it's tx by identify. But I can use mpsc to do that, I worte 2 demos and pasted it into the body of the question, but I have to create mpsc with capacity 1 and close it manually.
I wrote 2 demos, as bellow.I wonder if this is an appropriate and efficient use of mpsc?
Version implemented using oneshot, cannot work.
#[tokio::test]
pub async fn test_async() -> Result<()>{
let mut executor = Executor::new();
let res1 = executor.run(1).await?;
let res2 = executor.run(2).await?;
println!("res1 {}, res2 {}", res1, res2);
Ok(())
}
struct Executor {
pub pool: ThreadPool,
pub txs: Arc<DashMap<i32, RwLock<oneshot::Sender<i32>>>>,
}
impl Executor {
pub fn new() -> Self {
Executor{
pool: ThreadPoolBuilder::new().num_threads(10).build().unwrap(),
txs: Arc::new(DashMap::new()),
}
}
pub async fn run(&mut self, index: i32) -> Result<i32> {
let (tx, rx) = oneshot::channel();
self.txs.insert(index, RwLock::new(tx));
let txs_clone = self.txs.clone();
self.pool.spawn(move || {
let spawn_tx = txs_clone.get(&index).unwrap();
let guard = block_on(spawn_tx.read());
// cannot work, send need ownership, it will cause move of self
guard.send(index);
});
let res = rx.await;
return Ok(res.unwrap());
}
}
Version implemented using mpsc, can work, not sure about performance
#[tokio::test]
pub async fn test_async() -> Result<()>{
let mut executor = Executor::new();
let res1 = executor.run(1).await?;
let res2 = executor.run(2).await?;
println!("res1 {}, res2 {}", res1, res2);
// close channel after task finished
executor.close(1);
executor.close(2);
Ok(())
}
struct Executor {
pub pool: ThreadPool,
pub txs: Arc<DashMap<i32, RwLock<mpsc::Sender<i32>>>>,
}
impl Executor {
pub fn new() -> Self {
Executor{
pool: ThreadPoolBuilder::new().num_threads(10).build().unwrap(),
txs: Arc::new(DashMap::new()),
}
}
pub fn close(&mut self, index:i32) {
self.txs.remove(&index);
}
pub async fn run(&mut self, index: i32) -> Result<i32> {
let (tx, mut rx) = mpsc::channel(1);
self.txs.insert(index, RwLock::new(tx));
let txs_clone = self.txs.clone();
self.pool.spawn(move || {
let spawn_tx = txs_clone.get(&index).unwrap();
let guard = block_on(spawn_tx.value().read());
block_on(guard.deref().send(index));
});
// 0 mock invalid value
let mut res:i32 = 0;
while let Some(data) = rx.recv().await {
println!("recv data {}", data);
res = data;
break;
}
return Ok(res);
}
}
Disclaimer: It's really hard to picture what you are attempting to achieve, because the examples provided are trivial to solve, with no justification for the added complexity (DashMap). As such, this answer will be progressive, though it will remain focused on solving the problem you demonstrated you had, and not necessarily the problem you're thinking of... as I have no crystal ball.
We'll be using the following Result type in the examples:
type Result<T> = Result<T, Box<dyn Error + Send + Sync + 'static>>;
Serial execution
The simplest way to execute a task, is to do so right here, right now.
impl Executor {
pub async fn run<F>(&self, task: F) -> Result<i32>
where
F: FnOnce() -> Future<Output = Result<i32>>,
{
task().await
}
}
Async execution - built-in
When the execution of a task may involve heavy-weight calculations, it may be beneficial to execute it on a background thread.
Whichever runtime you are using probably supports this functionality, I'll demonstrate with tokio:
impl Executor {
pub async fn run<F>(&self, task: F) -> Result<i32>
where
F: FnOnce() -> Result<i32>,
{
Ok(tokio::task::spawn_block(task).await??)
}
}
Async execution - one-shot
If you wish to have more control on the number of CPU-bound threads, either to limit them, or to partition the CPUs of the machine for different needs, then the async runtime may not be configurable enough and you may prefer to use a thread-pool instead.
In this case, synchronization back with the runtime can be achieved via channels, the simplest of which being the oneshot channel.
impl Executor {
pub async fn run<F>(&self, task: F) -> Result<i32>
where
F: FnOnce() -> Result<i32>,
{
let (tx, mut rx) = oneshot::channel();
self.pool.spawn(move || {
let result = task();
// Decide on how to handle the fact that nobody will read the result.
let _ = tx.send(result);
});
Ok(rx.await??)
}
}
Note that in all of the above solutions, task remains agnostic as to how it's executed. This is a property you should strive for, as it makes it easier to change the way execution is handled in the future by more neatly separating the two concepts.

Rust - How to use a synchronous and an asynchronous crate in one application

I began writing a program using the Druid crate and Crabler crate to make a webscraping application whose data I can explore. I only realized that merging synchronous and asynchronous programming was a bad idea long after I had spent a while building this program. What I am trying to do right now is have the scraper run while the application is open (preferably every hour).
Right now the scraper doesn't run until after the application is closed. I tried to use Tokio's spawn to make a separate thread that starts before the application opens, but this doesn't work because the Crabler future doesn't have the "Send" trait.
I tried to make a minimal functional program as shown below. The title_handler doesn't function as expected but otherwise it demonstrates the issue I'm having well.
Is it possible to allow the WebScraper to run while the application is open? If so, how?
EDIT: I tried using task::spawn_blocking() to run the application and it threw out a ton of errors, including that druid doesn't implement the trait Send.
use crabler::*;
use druid::widget::prelude::*;
use druid::widget::{Align, Flex, Label, TextBox};
use druid::{AppLauncher, Data, Lens, WindowDesc, WidgetExt};
const ENTRY_PREFIX: [&str; 1] = ["https://duckduckgo.com/?t=ffab&q=rust&ia=web"];
// Use WebScraper trait to get each item with the ".result__title" class
#[derive(WebScraper)]
#[on_response(response_handler)]
#[on_html(".result__title", title_handler)]
struct Scraper {}
impl Scraper {
// Print webpage status
async fn response_handler(&self, response: Response) -> Result<()> {
println!("Status {}", response.status);
Ok(())
}
async fn title_handler(&self, _: Response, el: Element) -> Result<()> {
// Get text of element
let title_data = el.children();
let title_text = title_data.first().unwrap().text().unwrap();
println!("Result is {}", title_text);
Ok(())
}
}
// Run scraper to get info from https://duckduckgo.com/?t=ffab&q=rust&ia=web
async fn one_scrape() -> Result<()> {
let scraper = Scraper {};
scraper.run(Opts::new().with_urls(ENTRY_PREFIX.to_vec()).with_threads(1)).await
}
#[derive(Clone, Data, Lens)]
struct Init {
tag: String,
}
fn build_ui() -> impl Widget<Init> {
// Search box
let l_search = Label::new("Search: ");
let tb_search = TextBox::new()
.with_placeholder("Enter tag to search")
.lens(Init::tag);
let search = Flex::row()
.with_child(l_search)
.with_child(tb_search);
// Describe layout of UI
let layout = Flex::column()
.with_child(search);
Align::centered(layout)
}
#[async_std::main]
async fn main() -> Result<()> {
// Describe the main window
let main_window = WindowDesc::new(build_ui())
.title("Title Tracker")
.window_size((400.0, 400.0));
// Create starting app state
let init_state = Init {
tag: String::from("#"),
};
// Start application
AppLauncher::with_window(main_window)
.launch(init_state)
.expect("Failed to launch application");
one_scrape().await
}

Trouble Pinning Async Fn Object Using Tokio

I seem unable to use pnet's datalink library in any asynchronous way. The trouble appears to relate to pinning. This snippet adapted from pnet's main example:
async fn ethernet_channel(i: NetworkInterface) {
// Create Ethernet channel:
let (mut tx, mut rx) = match datalink::channel(&i, Default::default()) {
Ok(Ethernet(tx, rx)) => (tx, rx),
_ => panic!("Error creating channel")
};
loop { // handle inbound Ethernet packets forever
tokio::select! {
Ok(packet) = rx.next() => {}
// TODO: include some arm like an mpsc oneshot to break out, like Tokio's examples
}
}
}
Leads to compiler error: ror[E0599]: no method named 'poll' found for struct 'Pin<&mut std::result::Result<&[u8], std::io::Error>>' in the current scope
The Tokio tutorial covers this exact scenario and includes this remark: "to .await a reference, the value being referenced must be pinned or implement Unpin."
Combined with knowing the return value of the rx.next() function is the Result<&[u8], Error> mentioned in the error, I gather the packet byte array is the culprit "value being referenced" that must be pinned. I've tried many combinations of tokio::pin!(), including what makes the most sense to me based on Tokio's same example, to no avail:
#[tokio::main]
pub async fn main() -> Result<(), Box<dyn Error>> {
for interface in datalink::interfaces() {
let operation = ethernet_channel(interface);
tokio::pin!(operation);
tokio::select! {
_ = &mut operation => {}
}
}
}
I've also tried tokio::pin!() within my "ethernet_channel()" function, before and after datalink::channel(), on both the NetworkInterface and rx. I always end up with the same error as above. Any guidance appreciated, pulling my hair out.

Why does tokio::spawn have a delay when called next to crossbeam_channel::select?

I'm creating a task which will spawn other tasks. Some of them will take some time, so they cannot be awaited, but they can run in parallel:
src/main.rs
use crossbeam::crossbeam_channel::{bounded, select};
#[tokio::main]
async fn main() {
let (s, r) = bounded::<usize>(1);
tokio::spawn(async move {
let mut counter = 0;
loop {
let loop_id = counter.clone();
tokio::spawn(async move { // why this one was not fired?
println!("inner task {}", loop_id);
}); // .await.unwrap(); - solves issue, but this is long task which cannot be awaited
println!("loop {}", loop_id);
select! {
recv(r) -> rr => {
// match rr {
// Ok(ee) => {
// println!("received from channel {}", loop_id);
// tokio::spawn(async move {
// println!("received from channel task {}", loop_id);
// });
// },
// Err(e) => println!("{}", e),
// };
},
// more recv(some_channel) ->
}
counter = counter + 1;
}
});
// let s_clone = s.clone();
// tokio::spawn(async move {
// s_clone.send(2).unwrap();
// });
loop {
// rest of the program
}
}
I've noticed strange behavior. This outputs:
loop 0
I was expecting it to also output inner task 0.
If I send a value to channel, the output will be:
loop 0
inner task 0
loop 1
This is missing inner task 1.
Why is inner task spawned with one loop of delay?
The first time I noticed such behavior with 'received from channel task' delayed one loop, but when I reduced code to prepare sample this started to happen with 'inner task'. It might be worth mentioning that if I write second tokio::spawn right to another, only the last one will have this issue. Is there something I should be aware when calling tokio::spawn and select!? What causes this one loop of delay?
Cargo.toml dependencies
[dependencies]
tokio = { version = "0.2", features = ["full"] }
crossbeam = "0.7"
Rust 1.46, Windows 10
select! is blocking, and the docs for tokio::spawn say:
The spawned task may execute on the current thread, or it may be sent to a different thread to be executed.
In this case, the select! "future" is actually a blocking function, and spawn doesn't use a new thread (either in the first invocation or the one inside the loop).
Because you don't tell tokio that you are going to block, tokio doesn't think another thread is needed (from tokio's perspective, you only have 3 futures which should never block, so why would you need another thread anyway?).
The solution is to use the tokio::task::spawn_blocking for the select!-ing closure (which will no longer be a future, so async move {} is now move || {}).
Now tokio will know that this function actually blocks, and will move it to another thread (while keeping all the actual futures in other execution threads).
use crossbeam::crossbeam_channel::{bounded, select};
#[tokio::main]
async fn main() {
let (s, r) = bounded::<usize>(1);
tokio::task::spawn_blocking(move || {
// ...
});
loop {
// rest of the program
}
}
Link to playground
Another possible solution is to use a non-blocking channel like tokio::sync::mpsc, on which you can use await and get the expected behavior, like this playground example with direct recv().await or with tokio::select!, like this:
use tokio::sync::mpsc;
#[tokio::main]
async fn main() {
let (mut s, mut r) = mpsc::channel::<usize>(1);
tokio::spawn(async move {
loop {
// ...
tokio::select! {
Some(i) = r.recv() => {
println!("got = {}", i);
}
}
}
});
loop {
// rest of the program
}
}
Link to playground

How to create a global mutable bool status flag

Preface: I have done my research and know that it is really not a good idea/nor is it idiomatic Rust to have one. Completely open to suggestions of other ways to solve this issue.
Background: I have a console application that connects to a websocket and once connected successfully, the server sends a "Connected" message. I have the sender, and the receiver is separate threads and all is working great. After the connect() call a loop begins and places a prompt in the terminal, signaling that the application is ready to receive input from the user.
Problem: The issue is that the current flow of execution calls connect, and then immediately displays the prompt, and then the application receives the message from the server stating it is connected.
How I would solve this in higher level languages: Place a global bool (we'll call it ready) and once the application is "ready" then display the prompt.
How I think this might look in Rust:
//Possible global ready flag with 3 states (true, false, None)
let ready: Option<&mut bool> = None;
fn main(){
welcome_message(); //Displays a "Connecting..." message to the user
//These are special callback I created and basically when the
//message is received the `connected` is called.
//If there was an error getting the message (service is down)
//then `not_connected` is called. *This is working code*
let p = mylib::Promise::new(connected, not_connected);
//Call connect and start websocket send and receive threads
mylib::connect(p);
//Loop for user input
loop {
match ready {
Some(x) => {
if x == true { //If ready is true, display the prompt
match prompt_input() {
true => {},
false => break,
}
} else {
return; //If ready is false, quit the program
}
},
None => {} //Ready is None, so continue waiting
}
}
}
fn connected() -> &mut bool{
println!("Connected to Service! Please enter a command. (hint: help)\n\n");
true
}
fn not_connected() -> &mut bool{
println!("Connection to Service failed :(");
false
}
Question:
How would you solve this issue in Rust? I have tried passing it around to all the libraries method calls, but hit some major issues about borrowing an immutable object in a FnOnce() closure.
It really sounds like you want to have two threads that are communicating via channels. Check out this example:
use std::thread;
use std::sync::mpsc;
use std::time::Duration;
enum ConsoleEvent {
Connected,
Disconnected,
}
fn main() {
let (console_tx, console_rx) = mpsc::channel();
let socket = thread::spawn(move || {
println!("socket: started!");
// pretend we are taking time to connect
thread::sleep(Duration::from_millis(300));
println!("socket: connected!");
console_tx.send(ConsoleEvent::Connected).unwrap();
// pretend we are taking time to transfer
thread::sleep(Duration::from_millis(300));
println!("socket: disconnected!");
console_tx.send(ConsoleEvent::Disconnected).unwrap();
println!("socket: closed!");
});
let console = thread::spawn(move || {
println!("console: started!");
for msg in console_rx.iter() {
match msg {
ConsoleEvent::Connected => println!("console: I'm connected!"),
ConsoleEvent::Disconnected => {
println!("console: I'm disconnected!");
break;
}
}
}
});
socket.join().expect("Unable to join socket thread");
console.join().expect("Unable to join console thread");
}
Here, there are 3 threads at play:
The main thread.
A thread to read from the "socket".
A thread to interface with the user.
Each of these threads can maintain it's own non-shared state. This allows reasoning about each thread to be easier. The threads use a channel to send updates between them safely. The data that crosses threads is encapsulated in an enum.
When I run this, I get
socket: started!
console: started!
socket: connected!
console: I'm connected!
socket: disconnected!
socket: closed!
console: I'm disconnected!

Resources