HTTP request not being sent async - asynchronous

I am trying to write a program that takes a list of urls and uses async requests to retrieve the StatusCode and body. This is what It currently looks like, but the requests don't seem to be sent async, but one by one.
extern crate futures;
use futures::{stream, StreamExt};
use reqwest::{Client as http, StatusCode};
use std::time::Duration;
#[tokio::main]
async fn main() {
let client_builder = http::builder().connect_timeout(Duration::from_secs(5))
.danger_accept_invalid_certs(true)
.redirect(reqwest::redirect::Policy::none())
.timeout(Duration::from_secs(5))
.user_agent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36");
let client = client_builder.build().unwrap();
let mut urls = vec![];
for _ in 0 .. 100 {
urls.push("https://google.com:443".to_string());
}
let results = stream::iter(urls).filter_map(|url| async {
let response = (&client)
.get(&url)
.send()
.await
.ok()?; // Result<T, E> => Option<T> => T
Some((url, response))
})
.filter_map(|(url, response)| async {
let status = response.status();
let body = response
.text()
.await
.ok()?; // Result<T, E> => Option<T> => T
println!("{}", url);
Some((url, status, body))
})
.map(|elem| futures::future::ready(elem))
.buffer_unordered(40)
.collect::<Vec<(String, StatusCode, String)>>()
.await;
}
Modifying the value of .buffer_unordered(40) to 1 leads to no speed difference, which confirms the fact that the requwests are not being async. It's also obvious that the program waits for the response for each request individually when obvserving the print output.
What do I need to modify in order for my requests to be sent async?
It is also important for me that I can control conccurency using .buffer_unordered() and that the results are collected into a Vector, not printed, so removing the .collect() at the end won't work.

Firstly, this line is not good:
.map(|elem| futures::future::ready(elem))
It is basically forcing that if you want to proceed, we need to be able to get an elem: (url, status, body) result.
With this when buffer_unordered() wants to buffer them one by one, the futures are getting completed one by one. buffer_unordered can't magically undo what happens before it.
Secondly, once you remove this line, you'll see that buffer_unordered expects Future items (to be able to group them first and then run in parallel), but you are giving it (url, status, body) from filter_map.
So instead of filter_map you could use a plain "map" to map urls to async blocks (which are futures):
let results = stream::iter(urls)
.map(|url| async { // not run anything yet at this point,
// just convert url -> Future
let response = (&client)
.get(&url)
.send()
.await
.ok()?; // Result<T, E> => Option<T> => T
let status = response.status();
let body = response
.text()
.await
.ok()?; // Result<T, E> => Option<T> => T
println!("{} ready", url);
Some((url, status, body))
})
.buffer_unordered(40) // get up to 40 futures (async blocks)
// and run them in parallel
...
You could make this more explicit, make Futures eagerly first, and then lazily run them in a stream:
let futures = urls.into_iter().map(|url| async { ... })
let results = stream::iter(futures)
.buffer_unordered(40)
...
If you want to filter out errors, you can do this after buffer_unordered:
.buffer_unordered(40)
.filter_map(|result| result)
...

Related

How to deal with non-Send futures in a Tokio spawn context?

Tokio's spawn can only work with a Send future. This makes the following code invalid:
async fn async_foo(v: i32) {}
async fn async_computation() -> Result<i32, Box<dyn std::error::Error>> {
Ok(1)
}
async fn async_start() {
match async_computation().await {
Ok(v) => async_foo(v).await,
_ => unimplemented!(),
};
}
#[tokio::main]
async fn main() {
tokio::spawn(async move {
async_start().await;
});
}
The error is:
future cannot be sent between threads safely
the trait `Send` is not implemented for `dyn std::error::Error`
If I understand correctly: because async_foo(v).await might yield, Rust internally have to save all the context which might be on a different thread - hence the result of async_computation().await must be Send - which dyn std::error::Error is not.
This could be mitigated if the non-Send type can be dropped before the .await, such as:
async fn async_start() {
let result;
match async_computation().await {
Ok(v) => result = v,
_ => return,
};
async_foo(result).await;
}
However once another .await is needed before the non-Send type is dropped, the workarounds are more and more awkward.
What a is a good practice for this - especially when the non-Send type is for generic error handling (the Box<dyn std::error::Error>)? Is there a better type for errors that common IO ops implements (async or not) in Rust? Or there is a better way to group nested async calls?
Most errors are Send so you can just change the return type to:
Box<dyn std::error::Error + Send>
It's also common to have + Sync.

How to integrate async data collection with threadpool data processing in Rust

I'd like to improve the integration of my async data collection with my rayon data processing by overlapping the retrieval and the processing. Currently, I pull lots of pages from a web site using normal async code. Once that is complete, I do the cpu-intensive work using rayon's par_iter.
It seems like I should be able to easily overlap the processing, so that I'm not waiting for every last page before I begin the grunt work. Every page that I retrieve is independent of the others, so there is no need to wait before the conversion.
Here's what I have working currently (simplified just a bit):
use rayon::prelude::*;
use futures::{stream, StreamExt};
use reqwest::{Client, Result};
const CONCURRENT_REQUESTS: usize = usize::MAX;
const MAX_PAGE: usize = 1000;
#[tokio::main]
async fn main() {
// get data from server
let client = Client::new();
let bodies: Vec<Result<String>> = stream::iter(1..MAX_PAGE+1)
.map(|page_number| {
let client = &client;
async move {
client
.get(format!("https://someurl?{page_number}"))
.send()
.await?
.text()
.await
}
})
.buffer_unordered(CONCURRENT_REQUESTS)
.collect()
.await;
// transform the data
let mut rows: Vec<MyRow> = bodies
.par_iter()
.filter_map(|body| body.as_ref().ok())
.map(|data| {
let page = serde_json::from_str::<MyPage>(data).unwrap();
page.rows
.iter()
.map(|x| Row::new(x))
.collect::<Vec<MyRow>>()
})
.flatten()
.collect();
// do something with rows
}

".eth() method not found" when returning Http struct in a function

I want to establish an http connection to a Ganache test blockchain.
Going through the GitHub page of the web3 crate I found this example:
#[tokio::main]
async fn main() -> web3::Result<()> {
let _ = env_logger::try_init();
let transport = web3::transports::Http::new("http://localhost:7545")?;
let web3 = web3::Web3::new(transport);
let mut accounts = web3.eth().accounts().await?;
...
Ok(())
}
However I want to implement the connection setup in a function. So I tried the following:
async fn establish_web3_connection_http(url: &str) -> web3::Result<Web3<Http>>{
let transport = web3::transports::Http::new(url)?;
Ok(web3::Web3::new(transport))
}
...
#[tokio::main]
async fn main() -> web3::Result<()> {
let web3_con = establish_web3_connection_http("http://localhost:7545");
println!("Calling accounts.");
let mut accounts = web3_con.eth().accounts().await?;
Ok(())
}
This results in the following error:
Error
I am not sure why I do not return the correct value. There is not error when I
don't call web3_con, so the function seems to be fine.
Is the return value somehow wrong, or how I call it?
establish_web3_connection_http() is an async function, so it returns a future. You're trying to call .eth() on the future, when you probably want to call it on the value produced by the future. You need to await the result of this function:
let web3_con = establish_web3_connection_http("http://localhost:7545").await?;
// ^^^^^^^
However, you don't do any awaiting in establish_web3_connection_http(), so there's no reason it needs to be async in the first place. You could just remove async from its signature instead:
fn establish_web3_connection_http(url: &str) -> web3::Result<Web3<Http>>{

Why is async TcpStream blocking?

I'm working on a project to implement a distributed key value store in rust. I've made the server side code using Tokio's asynchronous runtime. I'm running into an issue where it seems my asynchronous code is blocking so when I have multiple connections to the server only one TcpStream is processed. I'm new to implementing async code, both in general and on rust, but I thought that other streams would be accepted and processed if there was no activity on a given tcp stream.
Is my understanding of async wrong or am I using tokio incorrectly?
This is my entry point:
use std::error::Error;
use std::net::SocketAddr;
use std::path::{Path, PathBuf};
use std::str::FromStr;
use std::sync::{Arc, Mutex};
use env_logger;
use log::{debug, info};
use structopt::StructOpt;
use tokio::net::TcpListener;
extern crate blue;
use blue::ipc::message;
use blue::store::args;
use blue::store::cluster::{Cluster, NodeRole};
use blue::store::deserialize::deserialize_store;
use blue::store::handler::handle_stream;
use blue::store::wal::WriteAheadLog;
#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
env_logger::Builder::from_env(env_logger::Env::default().default_filter_or("info")).init();
let opt = args::Opt::from_args();
let addr = SocketAddr::from_str(format!("{}:{}", opt.host, opt.port).as_str())?;
let role = NodeRole::from_str(opt.role.as_str()).unwrap();
let leader_addr = match role {
NodeRole::Leader => addr,
NodeRole::Follower => SocketAddr::from_str(opt.follow.unwrap().as_str())?,
};
let wal_name = addr.to_string().replace(".", "").replace(":", "");
let wal_full_name = format!("wal{}.log", wal_name);
let wal_path = PathBuf::from(wal_full_name);
let mut wal = match wal_path.exists() {
true => {
info!("Existing WAL found");
WriteAheadLog::open(&wal_path)?
}
false => {
info!("Creating WAL");
WriteAheadLog::new(&wal_path)?
}
};
debug!("WAL: {:?}", wal);
let store_name = addr.to_string().replace(".", "").replace(":", "");
let store_pth = format!("{}.pb", store_name);
let store_path = Path::new(&store_pth);
let mut store = match store_path.exists() {
true => deserialize_store(store_path)?,
false => message::Store::default(),
};
let listener = TcpListener::bind(addr).await?;
let cluster = Cluster::new(addr, &role, leader_addr, &mut wal, &mut store).await?;
let store_path = Arc::new(store_path);
let store = Arc::new(Mutex::new(store));
let wal = Arc::new(Mutex::new(wal));
let cluster = Arc::new(Mutex::new(cluster));
info!("Blue launched. Waiting for incoming connection");
loop {
let (stream, addr) = listener.accept().await?;
info!("Incoming request from {}", addr);
let store = Arc::clone(&store);
let store_path = Arc::clone(&store_path);
let wal = Arc::clone(&wal);
let cluster = Arc::clone(&cluster);
handle_stream(stream, store, store_path, wal, cluster, &role).await?;
}
}
Below is my handler (handle_stream from the above). I excluded all the handlers in match input as I didn't think they were necessary to prove the point (full code for that section is here: https://github.com/matthewmturner/Bradfield-Distributed-Systems/blob/main/blue/src/store/handler.rs if it actually helps).
Specifically the point that is blocking is the line let input = async_read_message::<message::Request>(&mut stream).await;
This is where the server is waiting for communication from either a client or another server in the cluster. The behavior I currently see is that after connecting to server with client the server doesn't receive any of the requests to add other nodes to the cluster - it only handles the client stream.
use std::io;
use std::net::{SocketAddr, TcpStream};
use std::path::Path;
use std::str::FromStr;
use std::sync::{Arc, Mutex};
use log::{debug, error, info};
use serde_json::json;
use tokio::io::AsyncWriteExt;
use tokio::net::TcpStream as asyncTcpStream;
use super::super::ipc::message;
use super::super::ipc::message::request::Command;
use super::super::ipc::receiver::async_read_message;
use super::super::ipc::sender::{async_send_message, send_message};
use super::cluster::{Cluster, NodeRole};
use super::serialize::persist_store;
use super::wal::WriteAheadLog;
// TODO: Why isnt async working? I.e. connecting servers after client is connected stays on client stream.
pub async fn handle_stream<'a>(
mut stream: asyncTcpStream,
store: Arc<Mutex<message::Store>>,
store_path: Arc<&Path>,
wal: Arc<Mutex<WriteAheadLog<'a>>>,
cluster: Arc<Mutex<Cluster>>,
role: &NodeRole,
) -> io::Result<()> {
loop {
info!("Handling stream: {:?}", stream);
let input = async_read_message::<message::Request>(&mut stream).await;
debug!("Input: {:?}", input);
match input {
...
}
}
}
This is the code for async_read_message
pub async fn async_read_message<M: Message + Default>(
stream: &mut asyncTcpStream,
) -> io::Result<M> {
let mut len_buf = [0u8; 4];
debug!("Reading message length");
stream.read_exact(&mut len_buf).await?;
let len = i32::from_le_bytes(len_buf);
let mut buf = vec![0u8; len as usize];
debug!("Reading message");
stream.read_exact(&mut buf).await?;
let user_input = M::decode(&mut buf.as_slice())?;
debug!("Received message: {:?}", user_input);
Ok(user_input)
}
Your problem lies with how you're handling messages after clients have connected:
handle_stream(stream, store, store_path, wal, cluster, &role).await?;
This .await means your listening loop will wait for handle_stream to return, but (making some assumptions) this function won't return until the client has disconnected. What you want is to tokio::spawn a new task that can run independently:
tokio::spawn(handle_stream(stream, store, store_path, wal, cluster, &role));
You may have to change some of your parameter types to avoid lifetimes; tokio::spawn requires 'static since the task's lifetime is decoupled from the scope where it was spawned.

How can I mutate the HTML inside a hyper::Response? [duplicate]

I want to write a server using the current master branch of Hyper that saves a message that is delivered by a POST request and sends this message to every incoming GET request.
I have this, mostly copied from the Hyper examples directory:
extern crate futures;
extern crate hyper;
extern crate pretty_env_logger;
use futures::future::FutureResult;
use hyper::{Get, Post, StatusCode};
use hyper::header::{ContentLength};
use hyper::server::{Http, Service, Request, Response};
use futures::Stream;
struct Echo {
data: Vec<u8>,
}
impl Echo {
fn new() -> Self {
Echo {
data: "text".into(),
}
}
}
impl Service for Echo {
type Request = Request;
type Response = Response;
type Error = hyper::Error;
type Future = FutureResult<Response, hyper::Error>;
fn call(&self, req: Self::Request) -> Self::Future {
let resp = match (req.method(), req.path()) {
(&Get, "/") | (&Get, "/echo") => {
Response::new()
.with_header(ContentLength(self.data.len() as u64))
.with_body(self.data.clone())
},
(&Post, "/") => {
//self.data.clear(); // argh. &self is not mutable :(
// even if it was mutable... how to put the entire body into it?
//req.body().fold(...) ?
let mut res = Response::new();
if let Some(len) = req.headers().get::<ContentLength>() {
res.headers_mut().set(ContentLength(0));
}
res.with_body(req.body())
},
_ => {
Response::new()
.with_status(StatusCode::NotFound)
}
};
futures::future::ok(resp)
}
}
fn main() {
pretty_env_logger::init().unwrap();
let addr = "127.0.0.1:12346".parse().unwrap();
let server = Http::new().bind(&addr, || Ok(Echo::new())).unwrap();
println!("Listening on http://{} with 1 thread.", server.local_addr().unwrap());
server.run().unwrap();
}
How do I turn the req.body() (which seems to be a Stream of Chunks) into a Vec<u8>? I assume I must somehow return a Future that consumes the Stream and turns it into a single Vec<u8>, maybe with fold(). But I have no clue how to do that.
Hyper 0.13 provides a body::to_bytes function for this purpose.
use hyper::body;
use hyper::{Body, Response};
pub async fn read_response_body(res: Response<Body>) -> Result<String, hyper::Error> {
let bytes = body::to_bytes(res.into_body()).await?;
Ok(String::from_utf8(bytes.to_vec()).expect("response was not valid utf-8"))
}
I'm going to simplify the problem to just return the total number of bytes, instead of echoing the entire stream.
Futures 0.3
Hyper 0.13 + TryStreamExt::try_fold
See euclio's answer about hyper::body::to_bytes if you just want all the data as one giant blob.
Accessing the stream allows for more fine-grained control:
use futures::TryStreamExt; // 0.3.7
use hyper::{server::Server, service, Body, Method, Request, Response}; // 0.13.9
use std::convert::Infallible;
use tokio; // 0.2.22
#[tokio::main]
async fn main() {
let addr = "127.0.0.1:12346".parse().expect("Unable to parse address");
let server = Server::bind(&addr).serve(service::make_service_fn(|_conn| async {
Ok::<_, Infallible>(service::service_fn(echo))
}));
println!("Listening on http://{}.", server.local_addr());
if let Err(e) = server.await {
eprintln!("Error: {}", e);
}
}
async fn echo(req: Request<Body>) -> Result<Response<Body>, hyper::Error> {
let (parts, body) = req.into_parts();
match (parts.method, parts.uri.path()) {
(Method::POST, "/") => {
let entire_body = body
.try_fold(Vec::new(), |mut data, chunk| async move {
data.extend_from_slice(&chunk);
Ok(data)
})
.await;
entire_body.map(|body| {
let body = Body::from(format!("Read {} bytes", body.len()));
Response::new(body)
})
}
_ => {
let body = Body::from("Can only POST to /");
Ok(Response::new(body))
}
}
}
Unfortunately, the current implementation of Bytes is no longer compatible with TryStreamExt::try_concat, so we have to switch back to a fold.
Futures 0.1
hyper 0.12 + Stream::concat2
Since futures 0.1.14, you can use Stream::concat2 to stick together all the data into one:
fn concat2(self) -> Concat2<Self>
where
Self: Sized,
Self::Item: Extend<<Self::Item as IntoIterator>::Item> + IntoIterator + Default,
use futures::{
future::{self, Either},
Future, Stream,
}; // 0.1.25
use hyper::{server::Server, service, Body, Method, Request, Response}; // 0.12.20
use tokio; // 0.1.14
fn main() {
let addr = "127.0.0.1:12346".parse().expect("Unable to parse address");
let server = Server::bind(&addr).serve(|| service::service_fn(echo));
println!("Listening on http://{}.", server.local_addr());
let server = server.map_err(|e| eprintln!("Error: {}", e));
tokio::run(server);
}
fn echo(req: Request<Body>) -> impl Future<Item = Response<Body>, Error = hyper::Error> {
let (parts, body) = req.into_parts();
match (parts.method, parts.uri.path()) {
(Method::POST, "/") => {
let entire_body = body.concat2();
let resp = entire_body.map(|body| {
let body = Body::from(format!("Read {} bytes", body.len()));
Response::new(body)
});
Either::A(resp)
}
_ => {
let body = Body::from("Can only POST to /");
let resp = future::ok(Response::new(body));
Either::B(resp)
}
}
}
You could also convert the Bytes into a Vec<u8> via entire_body.to_vec() and then convert that to a String.
See also:
How do I convert a Vector of bytes (u8) to a string
hyper 0.11 + Stream::fold
Similar to Iterator::fold, Stream::fold takes an accumulator (called init) and a function that operates on the accumulator and an item from the stream. The result of the function must be another future with the same error type as the original. The total result is itself a future.
fn fold<F, T, Fut>(self, init: T, f: F) -> Fold<Self, F, Fut, T>
where
F: FnMut(T, Self::Item) -> Fut,
Fut: IntoFuture<Item = T>,
Self::Error: From<Fut::Error>,
Self: Sized,
We can use a Vec as the accumulator. Body's Stream implementation returns a Chunk. This implements Deref<[u8]>, so we can use that to append each chunk's data to the Vec.
extern crate futures; // 0.1.23
extern crate hyper; // 0.11.27
use futures::{Future, Stream};
use hyper::{
server::{Http, Request, Response, Service}, Post,
};
fn main() {
let addr = "127.0.0.1:12346".parse().unwrap();
let server = Http::new().bind(&addr, || Ok(Echo)).unwrap();
println!(
"Listening on http://{} with 1 thread.",
server.local_addr().unwrap()
);
server.run().unwrap();
}
struct Echo;
impl Service for Echo {
type Request = Request;
type Response = Response;
type Error = hyper::Error;
type Future = Box<futures::Future<Item = Response, Error = Self::Error>>;
fn call(&self, req: Self::Request) -> Self::Future {
match (req.method(), req.path()) {
(&Post, "/") => {
let f = req.body()
.fold(Vec::new(), |mut acc, chunk| {
acc.extend_from_slice(&*chunk);
futures::future::ok::<_, Self::Error>(acc)
})
.map(|body| Response::new().with_body(format!("Read {} bytes", body.len())));
Box::new(f)
}
_ => panic!("Nope"),
}
}
}
You could also convert the Vec<u8> body to a String.
See also:
How do I convert a Vector of bytes (u8) to a string
Output
When called from the command line, we can see the result:
$ curl -X POST --data hello http://127.0.0.1:12346/
Read 5 bytes
Warning
All of these solutions allow a malicious end user to POST an infinitely sized file, which would cause the machine to run out of memory. Depending on the intended use, you may wish to establish some kind of cap on the number of bytes read, potentially writing to the filesystem at some breakpoint.
See also:
How do I apply a limit to the number of bytes read by futures::Stream::concat2?
Most of the answers on this topic are outdated or overly complicated. The solution is pretty simple:
/*
WARNING for beginners!!! This use statement
is important so we can later use .data() method!!!
*/
use hyper::body::HttpBody;
let my_vector: Vec<u8> = request.into_body().data().await.unwrap().unwrap().to_vec();
let my_string = String::from_utf8(my_vector).unwrap();
You can also use body::to_bytes as #euclio answered. Both approaches are straight-forward! Don't forget to handle unwrap properly.

Resources