Async recursive function that takes a mutex - asynchronous

How do you create an async recursive function that takes a mutex? Rust claims that this code holds a mutex across an await point. However, the value is dropped before the .await.
#[async_recursion]
async fn f(mutex: &Arc<Mutex<u128>>) {
let mut unwrapped = mutex.lock().unwrap();
*unwrapped += 1;
let value = *unwrapped;
drop(unwrapped);
tokio::time::sleep(tokio::time::Duration::from_millis(1000)).await;
if value < 100 {
f(mutex);
}
}
Error
future cannot be sent between threads safely
within `impl futures::Future<Output = ()>`, the trait `std::marker::Send` is not implemented for `std::sync::MutexGuard<'_, u128>`
required for the cast to the object type `dyn futures::Future<Output = ()> + std::marker::Send`rustc
lib.rs(251, 65): future is not `Send` as this value is used across an await

In this case, you can restructure the code to make it so unwrapped can't be used across an await:
let value = {
let mut unwrapped = mutex.lock().unwrap();
*unwrapped += 1;
*unwrapped
};
tokio::time::sleep(tokio::time::Duration::from_millis(1000)).await;
if value < 100 {
f(mutex);
}
If you weren't able to do this, then you'd need to make it so you don't return a Future that implements Send. The async_recursion docs specify an option you can pass to the macro to disable the Send bound it adds:
#[async_recursion(?Send)]
async fn f(mutex: &Arc<Mutex<u128>>) {
...
(playground)
You wouldn't be able to send such a Future across threads though.

Related

Parallel work stealing in arbitrary order in Rust

I'm trying to write a parallel data loader for deep learning in Rust. The task is to write an iterator that under the hood does the following
Reads files from disk and applies some compute-heavy preprocessing to them, the result is generally a numeric array (or multiple)
Groups the results of the previous step into batches of size B and "collates" them - this generally means just concatenating the arrays - moderately compute heavy
Yields the results from step 2.
Step 1 can be both IO and compute bound, depending on network latency, size of files and complexity of preprocessing. It has to be run in parallel by many workers. Step 2 should be off the main thread but likely doesn't need a pool of workers. Step 3 happens on main thread (exposed to Python).
The reason I write it in Rust is that Python offers two options: pure Python implementation shipped with PyTorch, based on multiprocessing, which is somewhat slow but very flexible (arbitrary user-defined data preprocessing and batching) and C++ implementation shipped with Tensorflow, which is assembled by the user from a set of predefined primitives. The latter is substantially faster but too restrictive for the kinds of data processing I wish to do. I expect that Rust will give me the speed of Tensorflow with flexibility of arbitrary code as in PyTorch.
My question is purely about the way to implement parallelism. The ideal setup is to have N workers for step 1) -> channel -> worker for step 2) -> channel -> step 3. Because the iterator object may be dropped at any time, there is a strict requirement to be able to terminate the whole scheme after Drop. On the other hand, there is the flexibility of loading the files in an arbitrary order: for example if the batch size B == 16 and max_n_threads == 32, it is perfectly fine to start 32 workers and yield the first batch containing the 16 examples which happen to return first. This can be exploited for speed.
My naive implementation creates the DataLoader in 3 steps:
Create a n_working: Arc<AtomicUsize> to control the number of worker threads active and should_shutdown: Arc<AtomicBool> to signal shutdown (when Drop is called)
Create a thread responsible for maintaining the pool. It spins on n_working < max_n_threads and keeps spawning worker threads which terminate on should_shutdown, otherwise fetch a single example, send it down the worker->batcher channel and decrement n_working
Create a batching thread which polls the worker->batcher channel, upon receiving B objects concatenates them into a batch and sends down the batcher->yielder channel
#[pyclass]
struct DataLoader {
collate_worker: Option<thread::JoinHandle<()>>,
example_worker: Option<thread::JoinHandle<()>>,
should_shut_down: Arc<AtomicBool>,
receiver: Receiver<Batch>,
length: usize,
}
impl DataLoader {
fn new(
dataset: Dataset,
batch_size: usize,
capacity: usize,
) -> Self {
let n_batches = dataset.len() / batch_size;
let max_n_threads = capacity * batch_size;
let (example_sender, collate_receiver) = bounded((batch_size - 1) * capacity);
let should_shut_down = Arc::new(AtomicBool::new(false));
let shutdown_flag = should_shut_down.clone();
let example_worker = thread::spawn(move || {
rayon::scope_fifo(|s| {
let dataset = &dataset;
let n_working = Arc::new(AtomicUsize::new(0));
let mut current_index = 0;
while current_index < n_batches * batch_size {
if n_working.load(Ordering::Relaxed) == max_n_threads {
continue;
}
if shutdown_flag.load(Ordering::Relaxed) {
break;
}
let index = current_index.clone();
let sender = example_sender.clone();
let counter = n_working.clone();
let shutdown_flag = shutdown_flag.clone();
s.spawn_fifo(move |_s| {
let example = dataset.get_example(index);
if !shutdown_flag.load(Ordering::Relaxed) {
_ = sender.send(example);
} // if we should shut down, skip sending
counter.fetch_sub(1, Ordering::Relaxed);
});
current_index += 1;
n_working.fetch_add(1, Ordering::Relaxed);
};
});
});
let (batch_sender, final_receiver) = bounded(capacity);
let shutdown_flag = should_shut_down.clone();
let collate_worker = thread::spawn(move || {
'outer: loop {
let mut batch = vec![];
for _ in 0..batch_size {
if let Ok(example) = collate_receiver.recv() {
batch.push(example);
} else {
break 'outer;
}
};
let collated = collate(batch);
if shutdown_flag.load(Ordering::Relaxed) {
break; // skip sending
}
_ = batch_sender.send(collated);
};
});
Self {
collate_worker: Some(collate_worker),
example_worker: Some(example_worker),
should_shut_down: should_shut_down,
receiver: final_receiver,
length: n_batches,
}
}
}
#[pymethods]
impl DataLoader {
fn __iter__(slf: PyRef<Self>) -> PyRef<Self> { slf }
fn __next__(&mut self) -> Option<Batch> {
self.receiver.recv().ok()
}
fn __len__(&self) -> usize {
self.length
}
}
impl Drop for DataLoader {
fn drop(&mut self) {
self.should_shut_down.store(true, Ordering::Relaxed);
if self.collate_worker.take().unwrap().join().is_err() {
println!("Panic in collate worker");
};
if self.example_worker.take().unwrap().join().is_err() {
println!("Panic in example_worker");
};
println!("dropped the dataloader");
}
}
This implementation works and roughly matches the performance of PyTorch but provides no significant speedup. I don't know where to look for improvements, but I imagine it would help to have the thing load-balance automatically in a work-stealing way and to flexibly spawn workers depending on the proportion of IO and compute time. I am also expecting performance issues due to the spinning pool manager and likely corner cases in my handling of Drop.
My question is how to best approach the problem. I am generally unsure if this should be tackled with parallel crates like rayon, async crates like tokio, or a mix of both. I also have the hunch my implementation could be much simpler with the correct use of their combinators/higher order APIs. I tried with rayon but I couldn't get a solution which doesn't wastefully enforce the original sequential returning order and respects the Drop requirement.
Okay I think I've figured out a solution for you that uses rayon parallel iterators.
The trick is to use Results in the rayon iterators, and return Err if the cancellation flag is set.
I first created a utility type to create a cancellable thread in which you can execute rayon iterators. You use it by passing in the thread closure which takes the atomic cancellation token as a parameter. Then you have to check if the cancellation token is true, and if so, exit early.
use std::sync::Arc;
use std::sync::atomic::{Ordering, AtomicBool};
use std::thread::JoinHandle;
fn collate(batch: &[Computed]) -> Batch {
batch.iter().map(|&x| i128::from(x)).sum()
}
#[derive(Debug)]
struct Cancelled;
struct CancellableThread<Output: Send + 'static> {
cancel_token: Arc<AtomicBool>,
thread: Option<JoinHandle<Result<Output, Cancelled>>>,
}
impl<Output: Send + 'static> CancellableThread<Output> {
fn new<F: FnOnce(Arc<AtomicBool>) -> Result<Output, Cancelled> + Send + 'static>(init: F) -> Self {
let cancel_token = Arc::new(AtomicBool::new(false));
let thread_cancel_token = Arc::clone(&cancel_token);
CancellableThread {
thread: Some(std::thread::spawn(move || init(thread_cancel_token))),
cancel_token,
}
}
fn output(mut self) -> Output {
self.thread.take().unwrap().join().unwrap().unwrap()
}
}
impl<Output: Send + 'static> Drop for CancellableThread<Output> {
fn drop(&mut self) {
self.cancel_token.store(true, Ordering::Relaxed);
if let Some(thread) = self.thread.take() {
let _ = thread.join().unwrap();
}
}
}
I found it useful to create a closure that returns a Result<(), Cancelled> so I could use the try operator (?) to exit early.
CancellableThread::new(move |cancel_token| {
let cancelled = || if cancel_token.load(Ordering::Relaxed) {
Err(Cancelled)
} else {
Ok(())
};
loop {
// was the thread dropped?
// if so, stop what we're doing
cancelled?;
// do stuff and
// eventually return a result
}
});
I then used that CancellableThread abstraction in the DataLoader. No need to create a special Drop impl for it, because by default, it will call drop on each field anyways, which will handle the cancellation.
type Data = Vec<u8>;
type Dataset = Vec<Data>;
type Computed = u64;
type Batch = i128;
use rayon::prelude::*;
use crossbeam::channel::{unbounded, Receiver};
struct DataLoader {
example_worker: CancellableThread<()>,
collate_worker: CancellableThread<()>,
receiver: Receiver<Batch>,
length: usize,
}
I used unbounded channels, as it was one less thing to bother about. It shouldn't be hard to switch to bounded ones instead.
impl DataLoader {
fn new(dataset: Dataset, batch_size: usize) -> Self {
let (example_sender, collate_receiver) = unbounded();
let (batch_sender, final_receiver) = unbounded();
I'm not sure if you can always guarantee that the number of items in your dataset will be a multiple of the batch_size, so I decided to handle that explicitly.
let length = if dataset.len() % batch_size == 0 {
dataset.len() / batch_size
} else {
dataset.len() / batch_size + 1
};
I created the collating worker first, though that may not be necessary. As you can see, I had to duplicate a little bit to handle partial batches.
let collate_worker = CancellableThread::new(move |cancel_token| {
let cancelled = || if cancel_token.load(Ordering::Relaxed) {
Err(Cancelled)
} else {
Ok(())
};
'outer: loop {
let mut batch = Vec::with_capacity(batch_size);
for _ in 0..batch_size {
cancelled()?;
if let Ok(data) = collate_receiver.recv() {
batch.push(data);
} else {
if !batch.is_empty() {
// handle the last batch, if there
// weren't enough items to fill it
let collated = collate(&batch);
cancelled()?;
batch_sender.send(collated).unwrap();
}
break 'outer;
}
}
let collated = collate(&batch);
cancelled()?;
batch_sender.send(collated).unwrap();
}
Ok(())
});
The example worker is where things are really made much simpler, because we can just use rayon parallel iterators. As you can see, we check for cancellation before each heavy computation.
let example_worker = CancellableThread::new(move |cancel_token| {
let cancelled = || if cancel_token.load(Ordering::Relaxed) {
Err(Cancelled)
} else {
Ok(())
};
let heavy_compute = |data: Data| -> Result<Computed, Cancelled> {
cancelled()?;
Ok(data.iter().map(|&x| u64::from(x)).product())
};
dataset
.into_par_iter()
.map(heavy_compute)
.try_for_each(|computed| {
example_sender.send(computed?).unwrap();
Ok(())
})
});
Then we just construct the DataLoader. You can see the Python impl is identical:
DataLoader {
example_worker,
collate_worker,
receiver: final_receiver,
length,
}
}
}
// #[pymethods]
impl DataLoader {
fn __iter__(this: Self /* PyRef<Self> */) -> Self /* PyRef<Self> */ { this }
fn __next__(&mut self) -> Option<Batch> {
self.receiver.recv().ok()
}
fn __len__(&self) -> usize {
self.length
}
}
playground

Unable to change pathBuff/path variable in async function

I was unsure if I should post this here or in code review.
Code review seems to have only functioning code.
So I've a multitude of problems I don't really understand.
(I’m a noob) full code can be found here: https://github.com/NicTanghe/winder/blob/main/src/main.rs
main problem is here:
let temp = location_loc1.parent().unwrap();
location_loc1.push(&temp);
I’ve tried various things to get around problems with borrowing as mutable or as reference,
and I can’t seem to get it to work.
I just get a different set of errors with everything I try.
Furthermore, I'm sorry if this is a duplicate, but looking for separate solutions to the errors just gave me a different error. In a circle.
Full function
async fn print_events(mut selector_loc1:i8, location_loc1: PathBuf) {
let mut reader = EventStream::new();
loop {
//let delay = Delay::new(Duration::from_millis(1_000)).fuse();
let mut event = reader.next().fuse();
select! {
// _ = delay => {
// print!("{esc}[2J{esc}[1;1H{}", esc = 27 as char,);
// },
maybe_event = event => {
match maybe_event {
Some(Ok(event)) => {
//println!("Event::{:?}\r", event);
// if event == Event::Mouse(MouseEvent::Up("Left").into()) {
// println!("Cursor position: {:?}\r", position());
// }
print!("{esc}[2J{esc}[1;1H{}", esc = 27 as char,);
if event == Event::Key(KeyCode::Char('k').into()) {
if selector_loc1 > 0 {
selector_loc1 -= 1;
};
//println!("go down");
//println!("{}",selected)
} else if event == Event::Key(KeyCode::Char('j').into()) {
selector_loc1 += 1;
//println!("go up");
//println!("{}",selected)
} else if event == Event::Key(KeyCode::Char('h').into()) {
//-----------------------------------------
//-------------BackLogic-------------------
//-----------------------------------------
let temp = location_loc1.parent().unwrap();
location_loc1.push(&temp);
//------------------------------------------
//------------------------------------------
} else if event == Event::Key(KeyCode::Char('l').into()) {
//go to next dir
} if event == Event::Key(KeyCode::Esc.into()) {
break;
}
printtype(location_loc1,selector_loc1);
}
Some(Err(e)) => println!("Error: {:?}\r", e),
None => break,
}
}
};
}
}
also, it seems using
use async_std::path::{Path, PathBuf};
makes the rust not recognize unwrap() function → how would I use using ?
There are two problems with your code.
Your PathBuf is immutable. It's not possible to modify immutable objects, unless they support interior mutability. PathBuf does not. Therefore you have to make your variable mutable. You can either add mut in front of it like that:
async fn print_events(mut selector_loc1:i8, mut location_loc1: PathBuf) {
Or you can re-bind it:
let mut location_loc1 = location_loc1;
You cannot have borrow it both mutable and immutably - the mutable borrows are exclusive! Given that the method .parent() borrows the buffer, you have to create a temporary owned value:
// the PathBuf instance
let mut path = PathBuf::from("root/parent/child");
// notice the .map(|p| p.to_owned()) method - it helps us avoid the immutable borrow
let parent = path.parent().map(|p| p.to_owned()).unwrap();
// now it's fine to modify it, as it's not borrowed
path.push(parent);
Your second question:
also, it seems using use async_std::path::{Path, PathBuf}; makes the rust not recognize unwrap() function → how would I use using ?
The async-std version is just a wrapper over std's PathBuf. It just delegates to the standard implementation, so it should not behave differently
// copied from async-std's PathBuf implementation
pub struct PathBuf {
inner: std::path::PathBuf,
}

Rust: Joining and iterating over futures' results

I have some code that iterates over objects and uses an async method on each of them sequentially before doing something with the results. I'd like to change it so that the async method calls are joined into a single future before being executed. The important bit below is in HolderStruct::add_squares. My current code looks like this:
use anyhow::Result;
struct AsyncMethodStruct {
value: u64
}
impl AsyncMethodStruct {
fn new(value: u64) -> Self {
AsyncMethodStruct {
value
}
}
async fn get_square(&self) -> Result<u64> {
Ok(self.value * self.value)
}
}
struct HolderStruct {
async_structs: Vec<AsyncMethodStruct>
}
impl HolderStruct {
fn new(async_structs: Vec<AsyncMethodStruct>) -> Self {
HolderStruct {
async_structs
}
}
async fn add_squares(&self) -> Result<u64> {
let mut squares = Vec::with_capacity(self.async_structs.len());
for async_struct in self.async_structs.iter() {
squares.push(async_struct.get_square().await?);
}
let mut sum = 0;
for square in squares.iter() {
sum += square;
}
return Ok(sum);
}
}
I'd like to change HolderStruct::add_squares to something like this:
use futures::future::join_all;
// [...]
impl HolderStruct {
async fn add_squares(&self) -> Result<u64> {
let mut square_futures = Vec::with_capacity(self.async_structs.len());
for async_struct in self.async_structs.iter() {
square_futures.push(async_struct.get_square());
}
let square_results = join_all(square_futures).await;
let mut sum = 0;
for square_result in square_results.iter() {
sum += square_result?;
}
return Ok(sum);
}
}
However, the compiler gives me this error using the above:
error[E0277]: the `?` operator can only be applied to values that implement `std::ops::Try`
--> src/main.rs:46:20
|
46 | sum += square_result?;
| ^^^^^^^^^^^^^^ the `?` operator cannot be applied to type `&std::result::Result<u64, anyhow::Error>`
|
= help: the trait `std::ops::Try` is not implemented for `&std::result::Result<u64, anyhow::Error>`
= note: required by `std::ops::Try::into_result`
How would I change the code to not have this error?
for square_result in square_results.iter()
Lose the iter() call here.
for square_result in square_results
You seem to be under impression that calling iter() is mandatory to iterate over a collection. Actually, anything that implements IntoIterator can be used in a for loop.
Calling iter() on a Vec<T> derefs to slice (&[T]) and yields an iterator over references to the vectors elements. The ? operator tries to take the value out of the Result, but that is only possible if you own the Result rather than just have a reference to it.
However, if you simply use a vector itself in a for statement, it will use the IntoIterator implementation for Vec<T> which will yield items of type T rather than &T.
square_results.into_iter() does the same thing, albeit more verbosely. It is mostly useful when using iterators in a functional style, a la vector.into_iter().map(|x| x + 1).collect().

Rust Async doesn't execute in parallel for sockets

I'm trying to send and receive simultaneously to a multicast IP with Rust.
use futures::executor::block_on;
use async_std::task;
use std::{net::{UdpSocket, Ipv4Addr}, time::{Duration, Instant}};
fn main() {
let future = async_main();
block_on(future);
}
async fn async_main() {
let mut socket = UdpSocket::bind("0.0.0.0:8888").unwrap();
let multi_addr = Ipv4Addr::new(234, 2, 2, 2);
let inter = Ipv4Addr::new(0,0,0,0);
socket.join_multicast_v4(&multi_addr,&inter);
let async_one = first(&socket);
let async_two = second(&socket);
futures::join!(async_one, async_two);
}
async fn first(socket: &std::net::UdpSocket) {
let mut buf = [0u8; 65535];
let now = Instant::now();
loop {
if now.elapsed().as_secs() > 10 { break; }
let (amt, src) = socket.recv_from(&mut buf).unwrap();
println!("received {} bytes from {:?}", amt, src);
}
}
async fn second(socket: &std::net::UdpSocket) {
let now = Instant::now();
loop {
if now.elapsed().as_secs() > 10 { break; }
socket.send_to(String::from("h").as_bytes(), "234.2.2.2:8888").unwrap();
}
}
The issue with this is first it runs the receive function and then it runs the send function, it never sends and receives simultaneously. With Golang I can do this with Goroutines but I'm finding this quite difficult in Rust.
I'm not very experienced with async in Rust, but your first() and second() functions don't appear to have any asynchronous calls in them -- in other words, there are not any calls that use .await. My understanding is that if nothing is awaited, then the functions will run synchronously, and I believe you get a compiler warning about it as well.
It doesn't look like std::net::UdpSocket provides any async methods that can be awaited, and you need to use async_std::net::UdpSocket instead.

Escaping closure setting views in DispatchQueue.main.async Swift 3

I'm dealing with some asynchronous functions and trying to update views. In short I have function 1 with asynchronous function that will return a string to be passed to function 2. I am updating views in both functions, on main thread. It all works but I need to understand if this is correct way.
class A {
var varA = ""
var varB = ""
func f1 (_ completion: #escaping (String) -> void ){
some asynchronous call ... { in
...
DispatchQueue.main.async {
self.varA = "something"
sef.labelA.text = self.varA
completion(self.varA)
}
}
}
func f2 (_ string: String){
another asynchronous call ... { in
...
DispatchQueue.main.async {
self.varB = string
sef.labelB.text = self.varB
}
}
}
// funcation call
f1(completion: f2)
}
Three questions, 1) What is the right way to run a dependent function where there is wait for an asynchronous callback?
2) Is DispatchQueue.main.async needed to update views?
3) Is it ok to call async func in another async callback? Isn't there chance self may be nil in some cases if you are updating views in some escaping function?
I'm going to try helping you according to your questions:
Question 1) There are many right ways and each developer can have its own logic, but in this case, what I personally would probably do is something like this:
class A {
func f1 (_ completion: #escaping (String) -> void ){
some asynchronous call ... { in
...
DispatchQueue.main.async { [weak self] in // 1
guard let strongSelf = self else { return } // 2
let varA = "something" // 3
strongSelf.label.text = varA
completion(varA) // 4
}
}
}
func f2 (_ string: String){
another asynchronous call ... { in
...
DispatchQueue.main.async {
sef.labelB.text = string // 5
}
}
}
// function call
// 6
f1(completion: { [weak self] text in
guard let strongSelf = self else { return }
strongSelf.f2(text)
})
}
1 - Here I'm using [weak self] to avoid retain cycles.
2 - Just unwrapping the optional self, case it's nil, I'll just return.
3 - In your case, it's not really necessary to have class variables, so I'm just creating local variables inside the block.
4 - Finally, I'm calling the completion with the variable containing the string.
5 - I also don't really need to set a class variable in here, so I'm just updating the label text with the string provided as a paramater.
6 - Then, I just need to call the first function and use the completion block to call the second after the first one completes.
Question 2) Yes, you must call DispatchQueue.main to update the view. This way your making sure that your code will be executed in the main thread that is crucial for things interacting with UI because it allow us to have a sincronization point as you can read in Apple's documentation.
Question 3) Using [weak self] and guard let strongSelf = self else { return }, I'm avoiding retain cycles and the cases where self can be nil.

Resources