Sharing data between async requests in Tokio - asynchronous

EDIT: I refactored the code to make it simpler.
I'm writing a small program to check a website for dead links, using tokio and reqwests to make requests async without the need for threading. But I also need to be able to return something from each of the requests that tokio is running; namely if a request failed or not.
fn fetch(req: Vec<&'static str>) {
let client = Client::new();
let (tx, rx) = mpsc::channel();
let req_len = req.len();
let work = stream::iter_ok(req)
.map(move |url| client.get(url).send())
.buffer_unordered(PARALLEL_REQUESTS)
.then(move |response| {
let this_tx = tx.clone();
match response {
Ok(x) => {
format_response(x);
this_tx.send(1).unwrap();
}
Err(x) => {
format_error(x);
}
}
future::ok(())
})
.for_each(|n| Ok(()));
tokio::run(work);
The code works, but I'd like some feedback as to what the best way of writing this in Rust would be.

Related

reqwest post request freezes after random amount of time

I started learning rust 2 weeks ago, and has been making this application that watches a log file, and sends a bulk of the information to an elasticsearch DB.
The problem is that after certain amount of time, it freezes (using 100% CPU) and I don't understand why.
I've cut down on a lot of code to try to figure out the issue, but it still keeps freezing on this line according to clion debugger
let _response = reqwest::Client::new()
.post("http://127.0.0.1/test.php")
.header("Content-Type", "application/json")
.body("{\"test\": true}")
.timeout(Duration::from_secs(30))
.send() // <-- Exactly here
.await;
It freezes and doesn't return any error message.
This is the code in context:
use std::{env};
use std::io::{stdout, Write};
use std::path::Path;
use std::time::Duration;
use logwatcher::{LogWatcher, LogWatcherAction};
use serde_json::{json, Value};
use serde_json::Value::Null;
use tokio;
#[tokio::main]
async fn main() {
let mut log_watcher = LogWatcher::register("/var/log/test.log").unwrap();
let mut counter = 0;
let BULK_SIZE = 500;
log_watcher.watch(&mut move |line: String| { // This triggers each time a new line is appended to /var/log/test.log
counter += 1;
if counter >= BULK_SIZE {
futures::executor::block_on(async { // This has to be async because log_watcher is not async
let _response = reqwest::Client::new()
.post("http://127.0.0.1/test.php") // <-- This is just for testing, it fails towards the DB too
.header("Content-Type", "application/json")
.body("{\"test\": true}")
.timeout(Duration::from_secs(30))
.send() // <-- Freezes here
.await;
if _response.is_ok(){
println!("Ok");
}
});
counter = 0;
}
LogWatcherAction::None
});
}
The log file gets about 625 new lines every minute. The crash happends after about ~5500 - ~25000 lines has gone through, or it seems a bit random in general.
I'm suspecting the issue is either something to do with LogWatcher, reqwest, the block_on or the mix of async.
Does anyone have any clue why it randomly freezes?
The problem was indeed because of a mix of async with tokio and block_on, NOT directly reqwest.
The problem was solved when changing main to be non-async, and using tokio as the block_on for async calls instead of futures::executor::block_on.
fn main() {
let mut log_watcher = LogWatcher::register("/var/log/test.log").unwrap();
let mut counter = 0;
let BULK_SIZE = 500;
log_watcher.watch(&mut move |line: String| {
counter += 1;
if counter >= BULK_SIZE {
tokio::runtime::Builder::new_multi_thread()
.enable_all()
.build()
.unwrap()
.block_on(async {
let _response = reqwest::Client::new()
.post("http://127.0.0.1/test.php")
.header("Content-Type", "application/json")
.body("{\"test\": true}")
.timeout(Duration::from_secs(30))
.send()
.await;
if _response.is_ok(){
println!("Ok");
}
});
counter = 0;
}
LogWatcherAction::None
});
}

How to integrate async data collection with threadpool data processing in Rust

I'd like to improve the integration of my async data collection with my rayon data processing by overlapping the retrieval and the processing. Currently, I pull lots of pages from a web site using normal async code. Once that is complete, I do the cpu-intensive work using rayon's par_iter.
It seems like I should be able to easily overlap the processing, so that I'm not waiting for every last page before I begin the grunt work. Every page that I retrieve is independent of the others, so there is no need to wait before the conversion.
Here's what I have working currently (simplified just a bit):
use rayon::prelude::*;
use futures::{stream, StreamExt};
use reqwest::{Client, Result};
const CONCURRENT_REQUESTS: usize = usize::MAX;
const MAX_PAGE: usize = 1000;
#[tokio::main]
async fn main() {
// get data from server
let client = Client::new();
let bodies: Vec<Result<String>> = stream::iter(1..MAX_PAGE+1)
.map(|page_number| {
let client = &client;
async move {
client
.get(format!("https://someurl?{page_number}"))
.send()
.await?
.text()
.await
}
})
.buffer_unordered(CONCURRENT_REQUESTS)
.collect()
.await;
// transform the data
let mut rows: Vec<MyRow> = bodies
.par_iter()
.filter_map(|body| body.as_ref().ok())
.map(|data| {
let page = serde_json::from_str::<MyPage>(data).unwrap();
page.rows
.iter()
.map(|x| Row::new(x))
.collect::<Vec<MyRow>>()
})
.flatten()
.collect();
// do something with rows
}

How to run multiple Tokio async tasks in a loop without using tokio::spawn?

I built a LED clock that also displays weather. My program does a couple of different things in a loop, each thing with a different interval:
updates the LEDs every 50ms,
checks the light level (to adjust the brightness) every 1 second,
fetches weather every 10 minutes,
actually some more, but that's irrelevant.
Updating the LEDs is the most critical: I don't want this to be delayed when e.g. weather is being fetched. This should not be a problem as fetching weather is mostly an async HTTP call.
Here's the code that I have:
let mut measure_light_stream = tokio::time::interval(Duration::from_secs(1));
let mut update_weather_stream = tokio::time::interval(WEATHER_FETCH_INTERVAL);
let mut update_leds_stream = tokio::time::interval(UPDATE_LEDS_INTERVAL);
loop {
tokio::select! {
_ = measure_light_stream.tick() => {
let light = lm.get_light();
light_smooth.sp = light;
},
_ = update_weather_stream.tick() => {
let fetched_weather = weather_service.get(&config).await;
// Store the fetched weather for later access from the displaying function.
weather_clock.weather = fetched_weather.clone();
},
_ = update_leds_stream.tick() => {
// Some code here that actually sets the LEDs.
// This code accesses the weather_clock, the light level etc.
},
}
}
I realised the code doesn't do what I wanted it to do - fetching the weather blocks the execution of the loop. I see why - the docs of tokio::select! say the other branches are cancelled as soon as the update_weather_stream.tick() expression completes.
How do I do this in such a way that while fetching the weather is waiting on network, the LEDs are still updated? I figured out I could use tokio::spawn to start a separate non-blocking "thread" for fetching weather, but then I have problems with weather_service not being Send, let alone weather_clock not being shareable between threads. I don't want this complication, I'm fine with everything running in a single thread, just like what select! does.
Reproducible example
use std::time::Duration;
use tokio::time::{interval, sleep};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut slow_stream = interval(Duration::from_secs(3));
let mut fast_stream = interval(Duration::from_millis(200));
// Note how access to this data is straightforward, I do not want
// this to get more complicated, e.g. care about threads and Send.
let mut val = 1;
loop {
tokio::select! {
_ = fast_stream.tick() => {
println!(".{}", val);
},
_ = slow_stream.tick() => {
println!("Starting slow operation...");
// The problem: During this await the dots are not printed.
sleep(Duration::from_secs(1)).await;
val += 1;
println!("...done");
},
}
}
}
You can use tokio::join! to run multiple async operations concurrently within the same task.
Here's an example:
async fn measure_light(halt: &Cell<bool>) {
while !halt.get() {
let light = lm.get_light();
// ....
tokio::time::sleep(Duration::from_secs(1)).await;
}
}
async fn blink_led(halt: &Cell<bool>) {
while !halt.get() {
// LED blinking code
tokio::time::sleep(UPDATE_LEDS_INTERVAL).await;
}
}
async fn poll_weather(halt: &Cell<bool>) {
while !halt.get() {
let weather = weather_service.get(&config).await;
// ...
tokio::time::sleep(WEATHER_FETCH_INTERVAL).await;
}
}
// example on how to terminate execution
async fn terminate(halt: &Cell<bool>) {
tokio::time::sleep(Duration::from_secs(10)).await;
halt.set(true);
}
async fn main() {
let halt = Cell::new(false);
tokio::join!(
measure_light(&halt),
blink_led(&halt),
poll_weather(&halt),
terminate(&halt),
);
}
If you're using tokio::TcpStream or other non-blocking IO, then it should allow for concurrent execution.
I've added a Cell flag for halting execution as an example. You can use the same technique to share any mutable state between join branches.
EDIT: Same thing can be done with tokio::select!. The main difference with your code is that the actual "business logic" is inside the futures awaited by select.
select allows you to drop unfinished futures instead of waiting for them to exit on their own (so halt termination flag is not necessary).
async fn main() {
tokio::select! {
_ = measure_light() => {},
_ = blink_led() = {},
_ = poll_weather() => {},
}
}
Here's a concrete solution, based on the second part of stepan's answer:
use std::time::Duration;
use tokio::time::sleep;
#[tokio::main]
async fn main() {
// Cell is an acceptable complication when accessing the data.
let val = std::cell::Cell::new(1);
tokio::select! {
_ = async {loop {
println!(".{}", val.get());
sleep(Duration::from_millis(200)).await;
}} => {},
_ = async {loop {
println!("Starting slow operation...");
// The problem: During this await the dots are not printed.
sleep(Duration::from_secs(1)).await;
val.set(val.get() + 1);
println!("...done");
sleep(Duration::from_secs(3)).await;
}} => {},
}
}
Playground link

Why is async TcpStream blocking?

I'm working on a project to implement a distributed key value store in rust. I've made the server side code using Tokio's asynchronous runtime. I'm running into an issue where it seems my asynchronous code is blocking so when I have multiple connections to the server only one TcpStream is processed. I'm new to implementing async code, both in general and on rust, but I thought that other streams would be accepted and processed if there was no activity on a given tcp stream.
Is my understanding of async wrong or am I using tokio incorrectly?
This is my entry point:
use std::error::Error;
use std::net::SocketAddr;
use std::path::{Path, PathBuf};
use std::str::FromStr;
use std::sync::{Arc, Mutex};
use env_logger;
use log::{debug, info};
use structopt::StructOpt;
use tokio::net::TcpListener;
extern crate blue;
use blue::ipc::message;
use blue::store::args;
use blue::store::cluster::{Cluster, NodeRole};
use blue::store::deserialize::deserialize_store;
use blue::store::handler::handle_stream;
use blue::store::wal::WriteAheadLog;
#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info")).init();
let opt = args::Opt::from_args();
let addr = SocketAddr::from_str(format!("{}:{}", opt.host, opt.port).as_str())?;
let role = NodeRole::from_str(opt.role.as_str()).unwrap();
let leader_addr = match role {
NodeRole::Leader => addr,
NodeRole::Follower => SocketAddr::from_str(opt.follow.unwrap().as_str())?,
};
let wal_name = addr.to_string().replace(".", "").replace(":", "");
let wal_full_name = format!("wal{}.log", wal_name);
let wal_path = PathBuf::from(wal_full_name);
let mut wal = match wal_path.exists() {
true => {
info!("Existing WAL found");
WriteAheadLog::open(&wal_path)?
}
false => {
info!("Creating WAL");
WriteAheadLog::new(&wal_path)?
}
};
debug!("WAL: {:?}", wal);
let store_name = addr.to_string().replace(".", "").replace(":", "");
let store_pth = format!("{}.pb", store_name);
let store_path = Path::new(&store_pth);
let mut store = match store_path.exists() {
true => deserialize_store(store_path)?,
false => message::Store::default(),
};
let listener = TcpListener::bind(addr).await?;
let cluster = Cluster::new(addr, &role, leader_addr, &mut wal, &mut store).await?;
let store_path = Arc::new(store_path);
let store = Arc::new(Mutex::new(store));
let wal = Arc::new(Mutex::new(wal));
let cluster = Arc::new(Mutex::new(cluster));
info!("Blue launched. Waiting for incoming connection");
loop {
let (stream, addr) = listener.accept().await?;
info!("Incoming request from {}", addr);
let store = Arc::clone(&store);
let store_path = Arc::clone(&store_path);
let wal = Arc::clone(&wal);
let cluster = Arc::clone(&cluster);
handle_stream(stream, store, store_path, wal, cluster, &role).await?;
}
}
Below is my handler (handle_stream from the above). I excluded all the handlers in match input as I didn't think they were necessary to prove the point (full code for that section is here: https://github.com/matthewmturner/Bradfield-Distributed-Systems/blob/main/blue/src/store/handler.rs if it actually helps).
Specifically the point that is blocking is the line let input = async_read_message::<message::Request>(&mut stream).await;
This is where the server is waiting for communication from either a client or another server in the cluster. The behavior I currently see is that after connecting to server with client the server doesn't receive any of the requests to add other nodes to the cluster - it only handles the client stream.
use std::io;
use std::net::{SocketAddr, TcpStream};
use std::path::Path;
use std::str::FromStr;
use std::sync::{Arc, Mutex};
use log::{debug, error, info};
use serde_json::json;
use tokio::io::AsyncWriteExt;
use tokio::net::TcpStream as asyncTcpStream;
use super::super::ipc::message;
use super::super::ipc::message::request::Command;
use super::super::ipc::receiver::async_read_message;
use super::super::ipc::sender::{async_send_message, send_message};
use super::cluster::{Cluster, NodeRole};
use super::serialize::persist_store;
use super::wal::WriteAheadLog;
// TODO: Why isnt async working? I.e. connecting servers after client is connected stays on client stream.
pub async fn handle_stream<'a>(
mut stream: asyncTcpStream,
store: Arc<Mutex<message::Store>>,
store_path: Arc<&Path>,
wal: Arc<Mutex<WriteAheadLog<'a>>>,
cluster: Arc<Mutex<Cluster>>,
role: &NodeRole,
) -> io::Result<()> {
loop {
info!("Handling stream: {:?}", stream);
let input = async_read_message::<message::Request>(&mut stream).await;
debug!("Input: {:?}", input);
match input {
...
}
}
}
This is the code for async_read_message
pub async fn async_read_message<M: Message + Default>(
stream: &mut asyncTcpStream,
) -> io::Result<M> {
let mut len_buf = [0u8; 4];
debug!("Reading message length");
stream.read_exact(&mut len_buf).await?;
let len = i32::from_le_bytes(len_buf);
let mut buf = vec![0u8; len as usize];
debug!("Reading message");
stream.read_exact(&mut buf).await?;
let user_input = M::decode(&mut buf.as_slice())?;
debug!("Received message: {:?}", user_input);
Ok(user_input)
}
Your problem lies with how you're handling messages after clients have connected:
handle_stream(stream, store, store_path, wal, cluster, &role).await?;
This .await means your listening loop will wait for handle_stream to return, but (making some assumptions) this function won't return until the client has disconnected. What you want is to tokio::spawn a new task that can run independently:
tokio::spawn(handle_stream(stream, store, store_path, wal, cluster, &role));
You may have to change some of your parameter types to avoid lifetimes; tokio::spawn requires 'static since the task's lifetime is decoupled from the scope where it was spawned.

Why does holding a non-Send type across an await point result in a non-Send Future?

In the documentation for the Send trait, there is a nice example of how something like Rc is not Send, since cloning/dropping in two different threads can cause the reference count to get out of sync.
What is less clear, however, is why holding a binding to a non-Send type across an await point in an async fn causes the generated future to also be non-Send. I was able to find a work around for when the compiler has been too conservative in the work-arounds chapter of the async handbook, but it does not go as far as answering the question that I am asking here.
Perhaps someone could shed some light on this with an example of why having a non-Send type in a Future is ok, but holding it across an await is not?
When you use .await in an async function, the compiler builds a state machine behind the scenes. Each .await introduces a new state (while it waits for something) and the code in between are state transitions (aka tasks), which will be triggered based on some external event (e.g. from IO or a timer etc).
Each task gets scheduled to be executed by the async runtime, which could choose to use a different thread from the previous task. If the state transition is not safe to be sent between threads then the resulting Future is also not Send so that you get a compilation error if you try to execute it in a multi-threaded runtime.
It is completely OK for a Future not to be Send, it just means you can only execute it in a single-threaded runtime.
Perhaps someone could shed some light on this with an example of why having a non-Send type in a Future is ok, but holding it across an await is not?
Consider the following simple example:
async fn add_votes(current: Rc<Cell<i32>>, post: Url) {
let new_votes = get_votes(&post).await;
*current += new_votes;
}
The compiler will construct a state machine like this (simplified):
enum AddVotes {
Initial {
current: Rc<Cell<i32>>,
post: Url,
},
WaitingForGetVotes {
current: Rc<Cell<i32>>,
fut: GetVotesFut,
},
}
impl AddVotes {
fn new(current: Rc<Cell<i32>>, post: Url) {
AddVotes::Initial { current, post }
}
fn poll(&mut self) -> Poll {
match self {
AddVotes::Initial(state) => {
let fut = get_votes(&state.post);
*self = AddVotes::WaitingForGetVotes {
current: state.current,
fut
}
Poll::Pending
}
AddVotes::WaitingForGetVotes(state) => {
if let Poll::Ready(votes) = state.fut.poll() {
*state.current += votes;
Poll::Ready(())
} else {
Poll::Pending
}
}
}
}
}
In a multithreaded runtime, each call to poll could be from a different thread, in which case the runtime would move the AddVotes to the other thread before calling poll on it. This won't work because Rc cannot be sent between threads.
However, if the future just used an Rc within the same state transition, it would be fine, e.g. if votes was just an i32:
async fn add_votes(current: i32, post: Url) -> i32 {
let new_votes = get_votes(&post).await;
// use an Rc for some reason:
let rc = Rc::new(1);
println!("rc value: {:?}", rc);
current + new_votes
}
In which case, the state machine would look like this:
enum AddVotes {
Initial {
current: i32,
post: Url,
},
WaitingForGetVotes {
current: i32,
fut: GetVotesFut,
},
}
The Rc isn't captured in the state machine because it is created and dropped within the state transition (task), so the whole state machine (aka Future) is still Send.

Resources