Unable to change pathBuff/path variable in async function - asynchronous

I was unsure if I should post this here or in code review.
Code review seems to have only functioning code.
So I've a multitude of problems I don't really understand.
(I’m a noob) full code can be found here: https://github.com/NicTanghe/winder/blob/main/src/main.rs
main problem is here:
let temp = location_loc1.parent().unwrap();
location_loc1.push(&temp);
I’ve tried various things to get around problems with borrowing as mutable or as reference,
and I can’t seem to get it to work.
I just get a different set of errors with everything I try.
Furthermore, I'm sorry if this is a duplicate, but looking for separate solutions to the errors just gave me a different error. In a circle.
Full function
async fn print_events(mut selector_loc1:i8, location_loc1: PathBuf) {
let mut reader = EventStream::new();
loop {
//let delay = Delay::new(Duration::from_millis(1_000)).fuse();
let mut event = reader.next().fuse();
select! {
// _ = delay => {
// print!("{esc}[2J{esc}[1;1H{}", esc = 27 as char,);
// },
maybe_event = event => {
match maybe_event {
Some(Ok(event)) => {
//println!("Event::{:?}\r", event);
// if event == Event::Mouse(MouseEvent::Up("Left").into()) {
// println!("Cursor position: {:?}\r", position());
// }
print!("{esc}[2J{esc}[1;1H{}", esc = 27 as char,);
if event == Event::Key(KeyCode::Char('k').into()) {
if selector_loc1 > 0 {
selector_loc1 -= 1;
};
//println!("go down");
//println!("{}",selected)
} else if event == Event::Key(KeyCode::Char('j').into()) {
selector_loc1 += 1;
//println!("go up");
//println!("{}",selected)
} else if event == Event::Key(KeyCode::Char('h').into()) {
//-----------------------------------------
//-------------BackLogic-------------------
//-----------------------------------------
let temp = location_loc1.parent().unwrap();
location_loc1.push(&temp);
//------------------------------------------
//------------------------------------------
} else if event == Event::Key(KeyCode::Char('l').into()) {
//go to next dir
} if event == Event::Key(KeyCode::Esc.into()) {
break;
}
printtype(location_loc1,selector_loc1);
}
Some(Err(e)) => println!("Error: {:?}\r", e),
None => break,
}
}
};
}
}
also, it seems using
use async_std::path::{Path, PathBuf};
makes the rust not recognize unwrap() function → how would I use using ?

There are two problems with your code.
Your PathBuf is immutable. It's not possible to modify immutable objects, unless they support interior mutability. PathBuf does not. Therefore you have to make your variable mutable. You can either add mut in front of it like that:
async fn print_events(mut selector_loc1:i8, mut location_loc1: PathBuf) {
Or you can re-bind it:
let mut location_loc1 = location_loc1;
You cannot have borrow it both mutable and immutably - the mutable borrows are exclusive! Given that the method .parent() borrows the buffer, you have to create a temporary owned value:
// the PathBuf instance
let mut path = PathBuf::from("root/parent/child");
// notice the .map(|p| p.to_owned()) method - it helps us avoid the immutable borrow
let parent = path.parent().map(|p| p.to_owned()).unwrap();
// now it's fine to modify it, as it's not borrowed
path.push(parent);
Your second question:
also, it seems using use async_std::path::{Path, PathBuf}; makes the rust not recognize unwrap() function → how would I use using ?
The async-std version is just a wrapper over std's PathBuf. It just delegates to the standard implementation, so it should not behave differently
// copied from async-std's PathBuf implementation
pub struct PathBuf {
inner: std::path::PathBuf,
}

Related

Parallel work stealing in arbitrary order in Rust

I'm trying to write a parallel data loader for deep learning in Rust. The task is to write an iterator that under the hood does the following
Reads files from disk and applies some compute-heavy preprocessing to them, the result is generally a numeric array (or multiple)
Groups the results of the previous step into batches of size B and "collates" them - this generally means just concatenating the arrays - moderately compute heavy
Yields the results from step 2.
Step 1 can be both IO and compute bound, depending on network latency, size of files and complexity of preprocessing. It has to be run in parallel by many workers. Step 2 should be off the main thread but likely doesn't need a pool of workers. Step 3 happens on main thread (exposed to Python).
The reason I write it in Rust is that Python offers two options: pure Python implementation shipped with PyTorch, based on multiprocessing, which is somewhat slow but very flexible (arbitrary user-defined data preprocessing and batching) and C++ implementation shipped with Tensorflow, which is assembled by the user from a set of predefined primitives. The latter is substantially faster but too restrictive for the kinds of data processing I wish to do. I expect that Rust will give me the speed of Tensorflow with flexibility of arbitrary code as in PyTorch.
My question is purely about the way to implement parallelism. The ideal setup is to have N workers for step 1) -> channel -> worker for step 2) -> channel -> step 3. Because the iterator object may be dropped at any time, there is a strict requirement to be able to terminate the whole scheme after Drop. On the other hand, there is the flexibility of loading the files in an arbitrary order: for example if the batch size B == 16 and max_n_threads == 32, it is perfectly fine to start 32 workers and yield the first batch containing the 16 examples which happen to return first. This can be exploited for speed.
My naive implementation creates the DataLoader in 3 steps:
Create a n_working: Arc<AtomicUsize> to control the number of worker threads active and should_shutdown: Arc<AtomicBool> to signal shutdown (when Drop is called)
Create a thread responsible for maintaining the pool. It spins on n_working < max_n_threads and keeps spawning worker threads which terminate on should_shutdown, otherwise fetch a single example, send it down the worker->batcher channel and decrement n_working
Create a batching thread which polls the worker->batcher channel, upon receiving B objects concatenates them into a batch and sends down the batcher->yielder channel
#[pyclass]
struct DataLoader {
collate_worker: Option<thread::JoinHandle<()>>,
example_worker: Option<thread::JoinHandle<()>>,
should_shut_down: Arc<AtomicBool>,
receiver: Receiver<Batch>,
length: usize,
}
impl DataLoader {
fn new(
dataset: Dataset,
batch_size: usize,
capacity: usize,
) -> Self {
let n_batches = dataset.len() / batch_size;
let max_n_threads = capacity * batch_size;
let (example_sender, collate_receiver) = bounded((batch_size - 1) * capacity);
let should_shut_down = Arc::new(AtomicBool::new(false));
let shutdown_flag = should_shut_down.clone();
let example_worker = thread::spawn(move || {
rayon::scope_fifo(|s| {
let dataset = &dataset;
let n_working = Arc::new(AtomicUsize::new(0));
let mut current_index = 0;
while current_index < n_batches * batch_size {
if n_working.load(Ordering::Relaxed) == max_n_threads {
continue;
}
if shutdown_flag.load(Ordering::Relaxed) {
break;
}
let index = current_index.clone();
let sender = example_sender.clone();
let counter = n_working.clone();
let shutdown_flag = shutdown_flag.clone();
s.spawn_fifo(move |_s| {
let example = dataset.get_example(index);
if !shutdown_flag.load(Ordering::Relaxed) {
_ = sender.send(example);
} // if we should shut down, skip sending
counter.fetch_sub(1, Ordering::Relaxed);
});
current_index += 1;
n_working.fetch_add(1, Ordering::Relaxed);
};
});
});
let (batch_sender, final_receiver) = bounded(capacity);
let shutdown_flag = should_shut_down.clone();
let collate_worker = thread::spawn(move || {
'outer: loop {
let mut batch = vec![];
for _ in 0..batch_size {
if let Ok(example) = collate_receiver.recv() {
batch.push(example);
} else {
break 'outer;
}
};
let collated = collate(batch);
if shutdown_flag.load(Ordering::Relaxed) {
break; // skip sending
}
_ = batch_sender.send(collated);
};
});
Self {
collate_worker: Some(collate_worker),
example_worker: Some(example_worker),
should_shut_down: should_shut_down,
receiver: final_receiver,
length: n_batches,
}
}
}
#[pymethods]
impl DataLoader {
fn __iter__(slf: PyRef<Self>) -> PyRef<Self> { slf }
fn __next__(&mut self) -> Option<Batch> {
self.receiver.recv().ok()
}
fn __len__(&self) -> usize {
self.length
}
}
impl Drop for DataLoader {
fn drop(&mut self) {
self.should_shut_down.store(true, Ordering::Relaxed);
if self.collate_worker.take().unwrap().join().is_err() {
println!("Panic in collate worker");
};
if self.example_worker.take().unwrap().join().is_err() {
println!("Panic in example_worker");
};
println!("dropped the dataloader");
}
}
This implementation works and roughly matches the performance of PyTorch but provides no significant speedup. I don't know where to look for improvements, but I imagine it would help to have the thing load-balance automatically in a work-stealing way and to flexibly spawn workers depending on the proportion of IO and compute time. I am also expecting performance issues due to the spinning pool manager and likely corner cases in my handling of Drop.
My question is how to best approach the problem. I am generally unsure if this should be tackled with parallel crates like rayon, async crates like tokio, or a mix of both. I also have the hunch my implementation could be much simpler with the correct use of their combinators/higher order APIs. I tried with rayon but I couldn't get a solution which doesn't wastefully enforce the original sequential returning order and respects the Drop requirement.
Okay I think I've figured out a solution for you that uses rayon parallel iterators.
The trick is to use Results in the rayon iterators, and return Err if the cancellation flag is set.
I first created a utility type to create a cancellable thread in which you can execute rayon iterators. You use it by passing in the thread closure which takes the atomic cancellation token as a parameter. Then you have to check if the cancellation token is true, and if so, exit early.
use std::sync::Arc;
use std::sync::atomic::{Ordering, AtomicBool};
use std::thread::JoinHandle;
fn collate(batch: &[Computed]) -> Batch {
batch.iter().map(|&x| i128::from(x)).sum()
}
#[derive(Debug)]
struct Cancelled;
struct CancellableThread<Output: Send + 'static> {
cancel_token: Arc<AtomicBool>,
thread: Option<JoinHandle<Result<Output, Cancelled>>>,
}
impl<Output: Send + 'static> CancellableThread<Output> {
fn new<F: FnOnce(Arc<AtomicBool>) -> Result<Output, Cancelled> + Send + 'static>(init: F) -> Self {
let cancel_token = Arc::new(AtomicBool::new(false));
let thread_cancel_token = Arc::clone(&cancel_token);
CancellableThread {
thread: Some(std::thread::spawn(move || init(thread_cancel_token))),
cancel_token,
}
}
fn output(mut self) -> Output {
self.thread.take().unwrap().join().unwrap().unwrap()
}
}
impl<Output: Send + 'static> Drop for CancellableThread<Output> {
fn drop(&mut self) {
self.cancel_token.store(true, Ordering::Relaxed);
if let Some(thread) = self.thread.take() {
let _ = thread.join().unwrap();
}
}
}
I found it useful to create a closure that returns a Result<(), Cancelled> so I could use the try operator (?) to exit early.
CancellableThread::new(move |cancel_token| {
let cancelled = || if cancel_token.load(Ordering::Relaxed) {
Err(Cancelled)
} else {
Ok(())
};
loop {
// was the thread dropped?
// if so, stop what we're doing
cancelled?;
// do stuff and
// eventually return a result
}
});
I then used that CancellableThread abstraction in the DataLoader. No need to create a special Drop impl for it, because by default, it will call drop on each field anyways, which will handle the cancellation.
type Data = Vec<u8>;
type Dataset = Vec<Data>;
type Computed = u64;
type Batch = i128;
use rayon::prelude::*;
use crossbeam::channel::{unbounded, Receiver};
struct DataLoader {
example_worker: CancellableThread<()>,
collate_worker: CancellableThread<()>,
receiver: Receiver<Batch>,
length: usize,
}
I used unbounded channels, as it was one less thing to bother about. It shouldn't be hard to switch to bounded ones instead.
impl DataLoader {
fn new(dataset: Dataset, batch_size: usize) -> Self {
let (example_sender, collate_receiver) = unbounded();
let (batch_sender, final_receiver) = unbounded();
I'm not sure if you can always guarantee that the number of items in your dataset will be a multiple of the batch_size, so I decided to handle that explicitly.
let length = if dataset.len() % batch_size == 0 {
dataset.len() / batch_size
} else {
dataset.len() / batch_size + 1
};
I created the collating worker first, though that may not be necessary. As you can see, I had to duplicate a little bit to handle partial batches.
let collate_worker = CancellableThread::new(move |cancel_token| {
let cancelled = || if cancel_token.load(Ordering::Relaxed) {
Err(Cancelled)
} else {
Ok(())
};
'outer: loop {
let mut batch = Vec::with_capacity(batch_size);
for _ in 0..batch_size {
cancelled()?;
if let Ok(data) = collate_receiver.recv() {
batch.push(data);
} else {
if !batch.is_empty() {
// handle the last batch, if there
// weren't enough items to fill it
let collated = collate(&batch);
cancelled()?;
batch_sender.send(collated).unwrap();
}
break 'outer;
}
}
let collated = collate(&batch);
cancelled()?;
batch_sender.send(collated).unwrap();
}
Ok(())
});
The example worker is where things are really made much simpler, because we can just use rayon parallel iterators. As you can see, we check for cancellation before each heavy computation.
let example_worker = CancellableThread::new(move |cancel_token| {
let cancelled = || if cancel_token.load(Ordering::Relaxed) {
Err(Cancelled)
} else {
Ok(())
};
let heavy_compute = |data: Data| -> Result<Computed, Cancelled> {
cancelled()?;
Ok(data.iter().map(|&x| u64::from(x)).product())
};
dataset
.into_par_iter()
.map(heavy_compute)
.try_for_each(|computed| {
example_sender.send(computed?).unwrap();
Ok(())
})
});
Then we just construct the DataLoader. You can see the Python impl is identical:
DataLoader {
example_worker,
collate_worker,
receiver: final_receiver,
length,
}
}
}
// #[pymethods]
impl DataLoader {
fn __iter__(this: Self /* PyRef<Self> */) -> Self /* PyRef<Self> */ { this }
fn __next__(&mut self) -> Option<Batch> {
self.receiver.recv().ok()
}
fn __len__(&self) -> usize {
self.length
}
}
playground

Run background work and return "cancellation"-closure

I'm quite new to Rust, so this might be quite a beginner question: Assume I want to start some async action that runs in the background and with a function call I want to stop it. The desired API looks like this:
let stop = show_loadbar("loading some data...").await;
// load data...
stop().await;
My code:
pub fn show_loadbar(text: &str) -> Box<dyn FnOnce() -> Box<dyn Future<Output=()>>>
{
let (sender, receiver) = channel::bounded::<()>(0);
let display = task::spawn(async move {
while receiver.try_recv().is_err() {
// show loadbar: xxx.await;
}
// cleanup: yyy.await;
});
// return a function which stops the load bar
Box::new(move || {
Box::new(async {
sender.send(()).await;
display.await;
})
})
}
I played around quite a lot (creating a struct instead of a function and some combinations), but finally, I always get error like this:
error[E0373]: async block may outlive the current function, but it borrows `sender`, which is owned by the current function
--> src/terminal/loading.rs:23:24
|
23 | Box::new(async {
| ________________________^
24 | | sender.send(()).await;
| | ------ `sender` is borrowed here
25 | | display.await;
26 | | })
| |_________^ may outlive borrowed value `sender`
Given the described API, is it even possible to implement the function like this in Rust?
Independent of this, what is the Rust-way to do it? Maybe this interface is absolutely not how it should be done in Rust.
Thank you very much
The immediate error you see can be fixed by changing async to async move, so that it captures sender by value instead of by reference. But trying tu use your code reveals futher issues:
you can't (and probably don't need to) await show_loadbar(), since it's not itself async.
pin the boxed future to be able to await it.
a bounded async_std channel cannot have the capacity of 0 (it panics if given 0);
handle the error returned by sender.send(), e.g. by unwrapping it.
(optionally) get rid of the outer box by returning impl FnOnce(...) instead of Box<dyn FnOnce(...)>.
With these taken into account, the code would look like this:
pub fn show_loadbar(_text: &str) -> impl FnOnce() -> Pin<Box<dyn Future<Output = ()>>> {
let (sender, receiver) = channel::bounded::<()>(1);
let display = task::spawn(async move {
while receiver.try_recv().is_err() {
// show loadbar: xxx.await;
}
// cleanup: yyy.await;
});
// return a function which stops the load bar
|| {
Box::pin(async move {
sender.send(()).await.unwrap();
display.await;
})
}
}
// usage example:
async fn run() {
let cancel = show_loadbar("xxx");
task::sleep(Duration::from_secs(1)).await;
cancel().await;
}
fn main() {
task::block_on(run());
}

Escaping closure setting views in DispatchQueue.main.async Swift 3

I'm dealing with some asynchronous functions and trying to update views. In short I have function 1 with asynchronous function that will return a string to be passed to function 2. I am updating views in both functions, on main thread. It all works but I need to understand if this is correct way.
class A {
var varA = ""
var varB = ""
func f1 (_ completion: #escaping (String) -> void ){
some asynchronous call ... { in
...
DispatchQueue.main.async {
self.varA = "something"
sef.labelA.text = self.varA
completion(self.varA)
}
}
}
func f2 (_ string: String){
another asynchronous call ... { in
...
DispatchQueue.main.async {
self.varB = string
sef.labelB.text = self.varB
}
}
}
// funcation call
f1(completion: f2)
}
Three questions, 1) What is the right way to run a dependent function where there is wait for an asynchronous callback?
2) Is DispatchQueue.main.async needed to update views?
3) Is it ok to call async func in another async callback? Isn't there chance self may be nil in some cases if you are updating views in some escaping function?
I'm going to try helping you according to your questions:
Question 1) There are many right ways and each developer can have its own logic, but in this case, what I personally would probably do is something like this:
class A {
func f1 (_ completion: #escaping (String) -> void ){
some asynchronous call ... { in
...
DispatchQueue.main.async { [weak self] in // 1
guard let strongSelf = self else { return } // 2
let varA = "something" // 3
strongSelf.label.text = varA
completion(varA) // 4
}
}
}
func f2 (_ string: String){
another asynchronous call ... { in
...
DispatchQueue.main.async {
sef.labelB.text = string // 5
}
}
}
// function call
// 6
f1(completion: { [weak self] text in
guard let strongSelf = self else { return }
strongSelf.f2(text)
})
}
1 - Here I'm using [weak self] to avoid retain cycles.
2 - Just unwrapping the optional self, case it's nil, I'll just return.
3 - In your case, it's not really necessary to have class variables, so I'm just creating local variables inside the block.
4 - Finally, I'm calling the completion with the variable containing the string.
5 - I also don't really need to set a class variable in here, so I'm just updating the label text with the string provided as a paramater.
6 - Then, I just need to call the first function and use the completion block to call the second after the first one completes.
Question 2) Yes, you must call DispatchQueue.main to update the view. This way your making sure that your code will be executed in the main thread that is crucial for things interacting with UI because it allow us to have a sincronization point as you can read in Apple's documentation.
Question 3) Using [weak self] and guard let strongSelf = self else { return }, I'm avoiding retain cycles and the cases where self can be nil.

How to make Flow understand dynamic code that uses lodash for runtime type-checking?

Flow's dynamic code example indicates that Flow can figure out runtime type-checking:
function foo(x) {
if (typeof x === 'string') {
return x.length; // flow is smart enough to see this is safe
} else {
return x;
}
}
var res = foo('Hello') + foo(42);
But in real life, typeof isn't good enough. I usually use lodash's type-checking functions (_.isFunction, _.isString etc), which handle a lot of edge cases.
The problem is, if we change the example to use lodash for the runtime type-checking, Flow no longer understands it:
function foo(x) {
if (_.isString(x)) {
return x.length; // warning: `length` property not found in Number
} else {
return x;
}
}
var res = foo('Hello') + foo(42);
I tried using iflow-lodash but it doesn't seem to make a difference here.
What's the best solution to make Flow understand code that uses lodash for runtime type-checking? I'm new to Flow btw.
This would depend on having predicate types in your lodash libdefs.
Predicate types have recently been added to Flow. Although they are still in an experimental state so I would recommend being careful about their usage for anything serious for now.
function isString(x): boolean %checks { // << declare that the method is a refinement
return typeof x === 'string';
}
function method(x: string | number): number {
if (isString(x)) { // << valid refinement
return x.charCodeAt(0); // << no errors
} else {
return x;
}
}
[try it out]
Note: This answer may quickly fall out of date in one of the next releases as this is a brand new feature. Check out Flow's changelog for the latest information.
The solution for now if possible is to use the built-in refinements.
function method(x: string | number): number {
if (typeof x === "string") { // << Inline the check
return x.charCodeAt(0);
} else {
return x;
}
}
The most obvious solution for this specific case is:
if (_.isString(x) && typeof x === 'string') {
In general, you might be able to overcome Flow errors with creative error suppression, like this:
if (_.isString(x)) {
// #ManuallyTyped
var xStr: string = x;
return xStr.length;
} else { ... }
Make sure to define // #ManuallyTyped as a custom suppress_comment in your flow config file for this to work. You might need an ugly regex for that, see flow docs.
It's been a while since I've last done this, but if I recall correctly Flow will assume that your xStr is a string, while the rest of type checking will work just fine.

How to make a vector of boxed trait objects from another vector of them?

I am making a bot. When the bot receives a message it needs to check all the commands if they trigger on the message and if yes - perform an action.
So I have a vector of commands (Command trait) in main struct:
struct Bot {
cmds: Vec<Box<Command>>,
}
Everything is good until I try to make a list of triggered commands and to use them later in (&self mut) method:
let mut triggered: Vec<Box<command::Command>>;
for c in &self.cmds {
if c.check(&message) {
triggered.push(c.clone());
}
}
Error:
bot.rs:167:44: 167:56 error: mismatched types:
expected `Box<Command>`,
found `&Box<Command>`
(expected box,
found &-ptr) [E0308]
What am I doing wrong here? I tried a lot but nothing helps.
Initially I was doing the following:
for c in &self.cmds {
if c.check(&message) {
c.fire(&message, self);
}
}
but it gave me:
bot.rs:172:46: 172:50 error: cannot borrow `*self` as mutable because `self.cmds` is also borrowed as immutable [E0502]
bot.rs:172
c.fire(&message, self);
So I stackoverflowed it and came to solution above.
What am I doing wrong here? I tried a lot but nothing helps. Initially I was doing the following:
for c in &self.cmds {
if c.check(&message) {
c.fire(&message, self);
}
}
If the fire function does not need access to other commands, an option is to temporarily replace self.cmd with an empty vector:
trait Command {
fn check(&self, message: &str) -> bool;
fn fire(&mut self, bot: &Bot);
}
struct Bot {
cmds: Vec<Box<Command>>,
}
impl Bot {
fn test(&mut self, message: &str) {
use std::mem;
// replace self.cmds with a empty vector and returns the
// replaced vector
let mut cmds = mem::replace(&mut self.cmds, Vec::default());
for c in &mut cmds {
if c.check(message) {
c.fire(self);
}
}
// put back cmds in self.cmds
mem::replace(&mut self.cmds, cmds);
}
}
There are other answers that use this approach.
If fire does need access to some fields of Bot, you can pass only the needed fields instead of self:
c.fire(self.others)

Resources