How to avoid multiple Arc::clone when dealing with multiple closures? - asynchronous

let store = Arc::new(DashMap::new());
tokio::spawn({
let store = Arc::clone(&store);
async move {
stream::iter(vec![17, 19]).for_each(move |x| {
let store = Arc::clone(&store);
async move {
store.insert(x, x);
}
})
}
});
I have to clone two times and do it behind async blocks just to have an ability to use it once deep in the closures.
Are there any ways to simplify such code?

Just leave them out, the only necessary clone is the one inside of iter because each iteration needs it's own Arc.
This works unless there is some requirements not mentioned in the question.
let store = Arc::new(DashMap::new());
tokio::spawn(async move {
stream::iter(vec![17, 19]).for_each(move |x| {
let store = Arc::clone(&store);
async move {
store.insert(x, x);
}
})
});
If you can move the insert outside of the innermost async move block you can get rid of even that clone.

Related

How to integrate async data collection with threadpool data processing in Rust

I'd like to improve the integration of my async data collection with my rayon data processing by overlapping the retrieval and the processing. Currently, I pull lots of pages from a web site using normal async code. Once that is complete, I do the cpu-intensive work using rayon's par_iter.
It seems like I should be able to easily overlap the processing, so that I'm not waiting for every last page before I begin the grunt work. Every page that I retrieve is independent of the others, so there is no need to wait before the conversion.
Here's what I have working currently (simplified just a bit):
use rayon::prelude::*;
use futures::{stream, StreamExt};
use reqwest::{Client, Result};
const CONCURRENT_REQUESTS: usize = usize::MAX;
const MAX_PAGE: usize = 1000;
#[tokio::main]
async fn main() {
// get data from server
let client = Client::new();
let bodies: Vec<Result<String>> = stream::iter(1..MAX_PAGE+1)
.map(|page_number| {
let client = &client;
async move {
client
.get(format!("https://someurl?{page_number}"))
.send()
.await?
.text()
.await
}
})
.buffer_unordered(CONCURRENT_REQUESTS)
.collect()
.await;
// transform the data
let mut rows: Vec<MyRow> = bodies
.par_iter()
.filter_map(|body| body.as_ref().ok())
.map(|data| {
let page = serde_json::from_str::<MyPage>(data).unwrap();
page.rows
.iter()
.map(|x| Row::new(x))
.collect::<Vec<MyRow>>()
})
.flatten()
.collect();
// do something with rows
}

How to run multiple Tokio async tasks in a loop without using tokio::spawn?

I built a LED clock that also displays weather. My program does a couple of different things in a loop, each thing with a different interval:
updates the LEDs every 50ms,
checks the light level (to adjust the brightness) every 1 second,
fetches weather every 10 minutes,
actually some more, but that's irrelevant.
Updating the LEDs is the most critical: I don't want this to be delayed when e.g. weather is being fetched. This should not be a problem as fetching weather is mostly an async HTTP call.
Here's the code that I have:
let mut measure_light_stream = tokio::time::interval(Duration::from_secs(1));
let mut update_weather_stream = tokio::time::interval(WEATHER_FETCH_INTERVAL);
let mut update_leds_stream = tokio::time::interval(UPDATE_LEDS_INTERVAL);
loop {
tokio::select! {
_ = measure_light_stream.tick() => {
let light = lm.get_light();
light_smooth.sp = light;
},
_ = update_weather_stream.tick() => {
let fetched_weather = weather_service.get(&config).await;
// Store the fetched weather for later access from the displaying function.
weather_clock.weather = fetched_weather.clone();
},
_ = update_leds_stream.tick() => {
// Some code here that actually sets the LEDs.
// This code accesses the weather_clock, the light level etc.
},
}
}
I realised the code doesn't do what I wanted it to do - fetching the weather blocks the execution of the loop. I see why - the docs of tokio::select! say the other branches are cancelled as soon as the update_weather_stream.tick() expression completes.
How do I do this in such a way that while fetching the weather is waiting on network, the LEDs are still updated? I figured out I could use tokio::spawn to start a separate non-blocking "thread" for fetching weather, but then I have problems with weather_service not being Send, let alone weather_clock not being shareable between threads. I don't want this complication, I'm fine with everything running in a single thread, just like what select! does.
Reproducible example
use std::time::Duration;
use tokio::time::{interval, sleep};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut slow_stream = interval(Duration::from_secs(3));
let mut fast_stream = interval(Duration::from_millis(200));
// Note how access to this data is straightforward, I do not want
// this to get more complicated, e.g. care about threads and Send.
let mut val = 1;
loop {
tokio::select! {
_ = fast_stream.tick() => {
println!(".{}", val);
},
_ = slow_stream.tick() => {
println!("Starting slow operation...");
// The problem: During this await the dots are not printed.
sleep(Duration::from_secs(1)).await;
val += 1;
println!("...done");
},
}
}
}
You can use tokio::join! to run multiple async operations concurrently within the same task.
Here's an example:
async fn measure_light(halt: &Cell<bool>) {
while !halt.get() {
let light = lm.get_light();
// ....
tokio::time::sleep(Duration::from_secs(1)).await;
}
}
async fn blink_led(halt: &Cell<bool>) {
while !halt.get() {
// LED blinking code
tokio::time::sleep(UPDATE_LEDS_INTERVAL).await;
}
}
async fn poll_weather(halt: &Cell<bool>) {
while !halt.get() {
let weather = weather_service.get(&config).await;
// ...
tokio::time::sleep(WEATHER_FETCH_INTERVAL).await;
}
}
// example on how to terminate execution
async fn terminate(halt: &Cell<bool>) {
tokio::time::sleep(Duration::from_secs(10)).await;
halt.set(true);
}
async fn main() {
let halt = Cell::new(false);
tokio::join!(
measure_light(&halt),
blink_led(&halt),
poll_weather(&halt),
terminate(&halt),
);
}
If you're using tokio::TcpStream or other non-blocking IO, then it should allow for concurrent execution.
I've added a Cell flag for halting execution as an example. You can use the same technique to share any mutable state between join branches.
EDIT: Same thing can be done with tokio::select!. The main difference with your code is that the actual "business logic" is inside the futures awaited by select.
select allows you to drop unfinished futures instead of waiting for them to exit on their own (so halt termination flag is not necessary).
async fn main() {
tokio::select! {
_ = measure_light() => {},
_ = blink_led() = {},
_ = poll_weather() => {},
}
}
Here's a concrete solution, based on the second part of stepan's answer:
use std::time::Duration;
use tokio::time::sleep;
#[tokio::main]
async fn main() {
// Cell is an acceptable complication when accessing the data.
let val = std::cell::Cell::new(1);
tokio::select! {
_ = async {loop {
println!(".{}", val.get());
sleep(Duration::from_millis(200)).await;
}} => {},
_ = async {loop {
println!("Starting slow operation...");
// The problem: During this await the dots are not printed.
sleep(Duration::from_secs(1)).await;
val.set(val.get() + 1);
println!("...done");
sleep(Duration::from_secs(3)).await;
}} => {},
}
}
Playground link

Rust tokio alternative to fold and map to run a function concurrently with different inputs

I need a way to run the same function many times with different inputs.
And since the function depends on a slow web API, I need to run it concurrently and collect the results in one variable.
I use the following:
use tokio_stream::StreamExt;
async fn run(input: &str) -> Vec<String> {
vec![String::from(input), String::from(input)]
}
async fn main() {
let mut input = tokio_stream::iter(vec!["1","2","3","4","5","6","7","8"]);
let mut handles = vec![];
while let Some(domain) = input.next().await {
handles.push(run(domain));
}
let mut results = vec![];
let mut handles = tokio_stream::iter(handles);
while let Some(handle) = handles.next().await {
results.extend(handle.await);
}
}
I know there is a way with the futures crate, but I don't know if I can use it with tokio. Also tokio_stream::StreamExt contains fold and map methods but I can't find a way to use them without calling await.
What is the best way to do this?
IIUC what you want, you can use tokio::spawn to launch your tasks in the background and futures::join_all to wait until they have all completed. E.g. something like this (untested):
async fn run(input: &str) -> Vec<String> {
vec![String::from(input), String::from(input)]
}
async fn main() {
let input = vec!["1","2","3","4","5","6","7","8"];
let handles = input.iter().map (|domain| {
tokio::spawn (async move { run (domain).await })
});
let results = futures::join_all (handles).await;
}

Why does holding a non-Send type across an await point result in a non-Send Future?

In the documentation for the Send trait, there is a nice example of how something like Rc is not Send, since cloning/dropping in two different threads can cause the reference count to get out of sync.
What is less clear, however, is why holding a binding to a non-Send type across an await point in an async fn causes the generated future to also be non-Send. I was able to find a work around for when the compiler has been too conservative in the work-arounds chapter of the async handbook, but it does not go as far as answering the question that I am asking here.
Perhaps someone could shed some light on this with an example of why having a non-Send type in a Future is ok, but holding it across an await is not?
When you use .await in an async function, the compiler builds a state machine behind the scenes. Each .await introduces a new state (while it waits for something) and the code in between are state transitions (aka tasks), which will be triggered based on some external event (e.g. from IO or a timer etc).
Each task gets scheduled to be executed by the async runtime, which could choose to use a different thread from the previous task. If the state transition is not safe to be sent between threads then the resulting Future is also not Send so that you get a compilation error if you try to execute it in a multi-threaded runtime.
It is completely OK for a Future not to be Send, it just means you can only execute it in a single-threaded runtime.
Perhaps someone could shed some light on this with an example of why having a non-Send type in a Future is ok, but holding it across an await is not?
Consider the following simple example:
async fn add_votes(current: Rc<Cell<i32>>, post: Url) {
let new_votes = get_votes(&post).await;
*current += new_votes;
}
The compiler will construct a state machine like this (simplified):
enum AddVotes {
Initial {
current: Rc<Cell<i32>>,
post: Url,
},
WaitingForGetVotes {
current: Rc<Cell<i32>>,
fut: GetVotesFut,
},
}
impl AddVotes {
fn new(current: Rc<Cell<i32>>, post: Url) {
AddVotes::Initial { current, post }
}
fn poll(&mut self) -> Poll {
match self {
AddVotes::Initial(state) => {
let fut = get_votes(&state.post);
*self = AddVotes::WaitingForGetVotes {
current: state.current,
fut
}
Poll::Pending
}
AddVotes::WaitingForGetVotes(state) => {
if let Poll::Ready(votes) = state.fut.poll() {
*state.current += votes;
Poll::Ready(())
} else {
Poll::Pending
}
}
}
}
}
In a multithreaded runtime, each call to poll could be from a different thread, in which case the runtime would move the AddVotes to the other thread before calling poll on it. This won't work because Rc cannot be sent between threads.
However, if the future just used an Rc within the same state transition, it would be fine, e.g. if votes was just an i32:
async fn add_votes(current: i32, post: Url) -> i32 {
let new_votes = get_votes(&post).await;
// use an Rc for some reason:
let rc = Rc::new(1);
println!("rc value: {:?}", rc);
current + new_votes
}
In which case, the state machine would look like this:
enum AddVotes {
Initial {
current: i32,
post: Url,
},
WaitingForGetVotes {
current: i32,
fut: GetVotesFut,
},
}
The Rc isn't captured in the state machine because it is created and dropped within the state transition (task), so the whole state machine (aka Future) is still Send.

Why does tokio::spawn have a delay when called next to crossbeam_channel::select?

I'm creating a task which will spawn other tasks. Some of them will take some time, so they cannot be awaited, but they can run in parallel:
src/main.rs
use crossbeam::crossbeam_channel::{bounded, select};
#[tokio::main]
async fn main() {
let (s, r) = bounded::<usize>(1);
tokio::spawn(async move {
let mut counter = 0;
loop {
let loop_id = counter.clone();
tokio::spawn(async move { // why this one was not fired?
println!("inner task {}", loop_id);
}); // .await.unwrap(); - solves issue, but this is long task which cannot be awaited
println!("loop {}", loop_id);
select! {
recv(r) -> rr => {
// match rr {
// Ok(ee) => {
// println!("received from channel {}", loop_id);
// tokio::spawn(async move {
// println!("received from channel task {}", loop_id);
// });
// },
// Err(e) => println!("{}", e),
// };
},
// more recv(some_channel) ->
}
counter = counter + 1;
}
});
// let s_clone = s.clone();
// tokio::spawn(async move {
// s_clone.send(2).unwrap();
// });
loop {
// rest of the program
}
}
I've noticed strange behavior. This outputs:
loop 0
I was expecting it to also output inner task 0.
If I send a value to channel, the output will be:
loop 0
inner task 0
loop 1
This is missing inner task 1.
Why is inner task spawned with one loop of delay?
The first time I noticed such behavior with 'received from channel task' delayed one loop, but when I reduced code to prepare sample this started to happen with 'inner task'. It might be worth mentioning that if I write second tokio::spawn right to another, only the last one will have this issue. Is there something I should be aware when calling tokio::spawn and select!? What causes this one loop of delay?
Cargo.toml dependencies
[dependencies]
tokio = { version = "0.2", features = ["full"] }
crossbeam = "0.7"
Rust 1.46, Windows 10
select! is blocking, and the docs for tokio::spawn say:
The spawned task may execute on the current thread, or it may be sent to a different thread to be executed.
In this case, the select! "future" is actually a blocking function, and spawn doesn't use a new thread (either in the first invocation or the one inside the loop).
Because you don't tell tokio that you are going to block, tokio doesn't think another thread is needed (from tokio's perspective, you only have 3 futures which should never block, so why would you need another thread anyway?).
The solution is to use the tokio::task::spawn_blocking for the select!-ing closure (which will no longer be a future, so async move {} is now move || {}).
Now tokio will know that this function actually blocks, and will move it to another thread (while keeping all the actual futures in other execution threads).
use crossbeam::crossbeam_channel::{bounded, select};
#[tokio::main]
async fn main() {
let (s, r) = bounded::<usize>(1);
tokio::task::spawn_blocking(move || {
// ...
});
loop {
// rest of the program
}
}
Link to playground
Another possible solution is to use a non-blocking channel like tokio::sync::mpsc, on which you can use await and get the expected behavior, like this playground example with direct recv().await or with tokio::select!, like this:
use tokio::sync::mpsc;
#[tokio::main]
async fn main() {
let (mut s, mut r) = mpsc::channel::<usize>(1);
tokio::spawn(async move {
loop {
// ...
tokio::select! {
Some(i) = r.recv() => {
println!("got = {}", i);
}
}
}
});
loop {
// rest of the program
}
}
Link to playground

Resources