Downloading lots of files asynchronously - asynchronous

(Original code is taken from http://patshaughnessy.net/2020/1/20/downloading-100000-files-using-async-rust )
I have some code to download images concurrently using Rust, however, I'm not sure how to implement 2 things. The first, rate limiting, to avoid 429s. I tried using std::thread::sleep, however, this does not work as I would expect. (The same thing happens with tokio::time::sleep) (I did not write the original code, and am new to async rust, so I am unsure of the specifics of how and when the code runs.).
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let paths: Vec<String> = read_lines("links.txt")?;
let fetches = futures::stream::iter(
paths.into_iter().map(|path| {
async move {
let a = path.split('/').collect::<Vec<&str>>();
let file_name = a.last().unwrap();
tokio::time::sleep(tokio::time::Duration::from_secs(1));
match reqwest::get(&path).await {
Ok(resp) => {
if resp.status().as_u16() != 200 {
println!("failed to download");
println!("{}", path);
};
match resp.bytes().await {
Ok(bytes) => {
//println!("RESPONSE: {} bytes from {}", (bytes.len()), path);
write(format!("downloads/{}", file_name), bytes).unwrap();
}
Err(_) => println!("ERROR reading {}", path),
}
}
Err(_) => println!("ERROR downloading {}", path),
}
}
})
).buffer_unordered(200).collect::<Vec<()>>();
fetches.await;
Ok(())
}
Secondly, I want to have a list of a list of links which were "rate limited" (returned a 429 status), and print that once all other files are finished downloading. My attempt, and resulting compiler error message are as follows:
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut failed: Vec<String> = Vec::new();
let paths: Vec<String> = read_lines("links.txt")?;
let fetches = futures::stream::iter(
paths.into_iter().map(|path| {
async move {
let a = path.split('/').collect::<Vec<&str>>();
let file_name = a.last().unwrap();
match reqwest::get(&path).await {
Ok(resp) => {
if resp.status().as_u16() != 200 {
println!("failed to download {}", path);
failed.push(path);
return // To avoid downloading the file, which will not contain what we want.
};
match resp.bytes().await {
Ok(bytes) => {
//println!("RESPONSE: {} bytes from {}", (bytes.len()), path);
write(format!("downloads/{}", file_name), bytes).unwrap();
}
Err(_) => println!("ERROR reading {}", path),
}
}
Err(_) => println!("ERROR downloading {}", path),
}
}
})
).buffer_unordered(200).collect::<Vec<()>>();
fetches.await;
Ok(())
}
error[E0507]: cannot move out of `failed`, a captured variable in an `FnMut` closure
--> src/main.rs:28:20
|
24 | let mut failed: Vec<String> = Vec::new();
| ---------- captured outer variable
...
28 | async move {
| ____________________^
29 | | let a = path.split('/').collect::<Vec<&str>>();
30 | | let file_name = a.last().unwrap();
31 | | match reqwest::get(&path).await {
... |
35 | | failed.push(path);
| | ------
| | |
| | move occurs because `failed` has type `Vec<String>`, which does not implement the `Copy` trait
| | move occurs due to use in generator
... |
47 | | }
48 | | }
| |_________^ move out of `failed` occurs here
How can I do this?

Related

How can I return a vector of responses from a set of concurrent GET requests?

I'm just getting started with Rust and am trying to work with concurrent requests. My aim is to have an asynchronous function that returns a vector of responses from a number of GET requests. What I have currently does successfully execute the requests concurrently but does not return anything from the function:
// main.rs
mod api_requester;
#[tokio::main]
async fn main() {
let values = vec!["dog".to_string(), "cat".to_string(), "bear".to_string()];
let responses = api_requester::get_data(values).await;
println!("{:?}", responses)
// do more stuff with responses
}
// api_requester.rs
use serde::Deserialize;
use futures::{stream, StreamExt};
use reqwest::Client;
const CONCURRENT_REQUESTS: usize = 2;
const API_ENDPOINT: &str = "https://httpbin.org/response-headers";
#[derive(Deserialize, Debug)]
struct ApiResponse {
#[serde(rename(deserialize = "Content-Length"))]
content_length: String,
#[serde(rename(deserialize = "Content-Type"))]
content_type: String,
freeform: String
}
pub async fn get_data(values: Vec<String>) {
let client = Client::new();
let bodies = stream::iter(values)
.map(|value| {
let client = &client;
async move {
let resp = client.get(API_ENDPOINT)
.query(&[("freeform", value)])
.send().await?;
resp.json::<ApiResponse>().await
}
})
.buffer_unordered(CONCURRENT_REQUESTS);
bodies
.for_each(|body| async {
match body {
Ok(body) => println!("Got {:?}", body),
Err(e) => eprintln!("Got an error: {:?}", e),
}
})
.await;
}
My goal is to return a vector of the responses received from the GET requests back to the main function for further use. But this is where I'm having some serious confusion. I thought I would be able to just await the value in the function and return the vector when the futures have been resolved. something like this:
// api_requester.rs
...
pub async fn get_data(values: Vec<String>) -> Vec<ApiResponse> {
let client = Client::new();
let bodies = stream::iter(values)
.map(|value| {
let client = &client;
async move {
let resp = client.get(API_ENDPOINT)
.query(&[("freeform", value)])
.send().await?;
resp.json::<ApiResponse>().await
}
})
.buffer_unordered(CONCURRENT_REQUESTS);
bodies
}
This produces the following error:
error[E0308]: mismatched types
--> src/api_requester.rs:37:5
|
24 | .map(|value| {
| ------- the found closure
...
27 | async move {
| ________________________-
28 | | let resp = client.get(API_ENDPOINT)
29 | | .query(&[("freeform", value)])
30 | | .send().await?;
31 | |
32 | | resp.json::<ApiResponse>().await
33 | | }
| |_____________- the found `async` block
...
37 | bodies
| ^^^^^^ expected struct `Vec`, found struct `BufferUnordered`
|
::: /home/seraub/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/future/mod.rs:72:43
|
72 | pub const fn from_generator<T>(gen: T) -> impl Future<Output = T::Return>
| ------------------------------- the found opaque type
|
= note: expected struct `Vec<ApiResponse>`
found struct `BufferUnordered<futures::stream::Map<futures::stream::Iter<std::vec::IntoIter<std::string::String>>, [closure#src/api_requester.rs:24:14: 24:21]>>`
For more information about this error, try `rustc --explain E0308`.
I'm guessing that the BufferUnordered<futures::stream::Map<futures::stream::Iter<std::vec::IntoIter<std::string::String>>, [closure#src/api_requester.rs:24:14: 24:21]>> struct found by the compiler needs to be realized/completed?
How can I turn this BufferUnordered object into a simple vector of responses and return it back to the main function?

How do I borrow a mutable reference from an Arc<Mutex<T>> and call an async member function on it?

I have a simple struct and an implementation that looks like this.
#[derive(Debug)]
struct MyStruct {
data: u64,
}
impl MyStruct {
async fn something_async(&mut self) -> Result<(), Box<dyn Error>> {
self.data += 1;
Ok(())
}
}
I want to use MyStruct on the heap using a smart pointer and a mutex so that I can use it from multiple threads.
However, when I try to call the async function called something_async()...
tokio::spawn(async move {
let ptr = Arc::new(Mutex::new(MyStruct { data: 1 }));
let mut s = ptr.lock().unwrap();
s.something_async().await.unwrap();
println!("{:?}", s);
});
...I get the following error:
error: future cannot be sent between threads safely
--> src/main.rs:18:5
|
18 | tokio::spawn(async move {
| ^^^^^^^^^^^^ future created by async block is not `Send`
|
= help: within `impl Future<Output = [async output]>`, the trait `Send` is not implemented for `std::sync::MutexGuard<'_, MyStruct>`
note: future is not `Send` as this value is used across an await
--> src/main.rs:21:9
|
20 | let mut s = ptr.lock().unwrap();
| ----- has type `std::sync::MutexGuard<'_, MyStruct>` which is not `Send`
21 | s.something_async().await.unwrap();
| ^^^^^^^^^^^^^^^^^^^^^^^^ await occurs here, with `mut s` maybe used later
22 | println!("{:?}", s);
23 | });
| - `mut s` is later dropped here
note: required by a bound in `tokio::spawn`
--> /playground/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.16.1/src/task/spawn.rs:127:21
|
127 | T: Future + Send + 'static,
| ^^^^ required by this bound in `tokio::spawn`
I am assuming that the compiler does not like me keeping a MutexGuard held across an await, so I tried this instead:
let fut = {
let mut s = ptr.lock().unwrap();
s.something_async()
};
fut.await.unwrap();
But of course then it complains about the Future outliving the mutable reference not living long enough:
error[E0597]: `s` does not live long enough
--> src/main.rs:22:13
|
20 | let fut = {
| --- borrow later stored here
21 | let mut s = ptr.lock().unwrap();
22 | s.something_async()
| ^^^^^^^^^^^^^^^^^^ borrowed value does not live long enough
23 | };
| - `s` dropped here while still borrowed
How do I call an async method on an object wrapped in an Arc<Mutex<T>>?
Make MyStruct copyable.
#[derive(Debug, Copy, Clone)]
struct MyStruct {
data: u64,
}
impl MyStruct {
async fn something_async(&mut self) -> Result<(), Box<dyn Error>> {
self.data += 1;
Ok(())
}
}
#[tokio::main]
async fn main() {
tokio::spawn(async move {
let ptr = Arc::new(Mutex::new(MyStruct { data: 1 }));
let mut s = *ptr.lock().unwrap();
s.something_async().await.unwrap();
println!("{:?}", s);
});
}

How to correctly call async functions in a WebSocket handler in Actix-web

I have made some progress with this, using into_actor().spawn(), but I am struggling to access the ctx variable inside the async block.
I'll start with showing a compiling snippet of the web socket handler, then a failing snippet of the handler, then for reference the full code example.
Working snippet:
Focus on the match case Ok(ws::Message::Text(text))
/// Handler for `ws::Message`
impl StreamHandler<Result<ws::Message, ws::ProtocolError>> for MyWebSocket {
fn handle(&mut self, msg: Result<ws::Message, ws::ProtocolError>, ctx: &mut Self::Context) {
// process websocket messages
println!("WS: {:?}", msg);
match msg {
Ok(ws::Message::Ping(msg)) => {
self.hb = Instant::now();
ctx.pong(&msg);
}
Ok(ws::Message::Pong(_)) => {
self.hb = Instant::now();
}
Ok(ws::Message::Text(text)) => {
let future = async move {
let reader = processrunner::run_process(text).await;
let mut reader = reader.ok().unwrap();
while let Some(line) = reader.next_line().await.unwrap() {
// ctx.text(line);
println!("line = {}", line);
}
};
future.into_actor(self).spawn(ctx);
}
Ok(ws::Message::Binary(bin)) => ctx.binary(bin),
Ok(ws::Message::Close(reason)) => {
ctx.close(reason);
ctx.stop();
}
_ => ctx.stop(),
}
}
}
Not working snippet with ctx line uncommented.
/// Handler for `ws::Message`
impl StreamHandler<Result<ws::Message, ws::ProtocolError>> for MyWebSocket {
fn handle(&mut self, msg: Result<ws::Message, ws::ProtocolError>, ctx: &mut Self::Context) {
// process websocket messages
println!("WS: {:?}", msg);
match msg {
Ok(ws::Message::Ping(msg)) => {
self.hb = Instant::now();
ctx.pong(&msg);
}
Ok(ws::Message::Pong(_)) => {
self.hb = Instant::now();
}
Ok(ws::Message::Text(text)) => {
let future = async move {
let reader = processrunner::run_process(text).await;
let mut reader = reader.ok().unwrap();
while let Some(line) = reader.next_line().await.unwrap() {
ctx.text(line);
println!("line = {}", line);
}
};
future.into_actor(self).spawn(ctx);
}
Ok(ws::Message::Binary(bin)) => ctx.binary(bin),
Ok(ws::Message::Close(reason)) => {
ctx.close(reason);
ctx.stop();
}
_ => ctx.stop(),
}
}
}
Full code snippet split over two files.
main.rs
//! Simple echo websocket server.
//! Open `http://localhost:8080/ws/index.html` in browser
//! or [python console client](https://github.com/actix/examples/blob/master/websocket/websocket-client.py)
//! could be used for testing.
mod processrunner;
use std::time::{Duration, Instant};
use actix::prelude::*;
use actix_files as fs;
use actix_web::{middleware, web, App, Error, HttpRequest, HttpResponse, HttpServer};
use actix_web_actors::ws;
/// How often heartbeat pings are sent
const HEARTBEAT_INTERVAL: Duration = Duration::from_secs(5);
/// How long before lack of client response causes a timeout
const CLIENT_TIMEOUT: Duration = Duration::from_secs(10);
/// do websocket handshake and start `MyWebSocket` actor
async fn ws_index(r: HttpRequest, stream: web::Payload) -> Result<HttpResponse, Error> {
println!("{:?}", r);
let res = ws::start(MyWebSocket::new(), &r, stream);
println!("{:?}", res);
res
}
/// websocket connection is long running connection, it easier
/// to handle with an actor
struct MyWebSocket {
/// Client must send ping at least once per 10 seconds (CLIENT_TIMEOUT),
/// otherwise we drop connection.
hb: Instant,
}
impl Actor for MyWebSocket {
type Context = ws::WebsocketContext<Self>;
/// Method is called on actor start. We start the heartbeat process here.
fn started(&mut self, ctx: &mut Self::Context) {
self.hb(ctx);
}
}
/// Handler for `ws::Message`
impl StreamHandler<Result<ws::Message, ws::ProtocolError>> for MyWebSocket {
fn handle(&mut self, msg: Result<ws::Message, ws::ProtocolError>, ctx: &mut Self::Context) {
// process websocket messages
println!("WS: {:?}", msg);
match msg {
Ok(ws::Message::Ping(msg)) => {
self.hb = Instant::now();
ctx.pong(&msg);
}
Ok(ws::Message::Pong(_)) => {
self.hb = Instant::now();
}
Ok(ws::Message::Text(text)) => {
let future = async move {
let reader = processrunner::run_process(text).await;
let mut reader = reader.ok().unwrap();
while let Some(line) = reader.next_line().await.unwrap() {
// ctx.text(line);
println!("line = {}", line);
}
};
future.into_actor(self).spawn(ctx);
}
Ok(ws::Message::Binary(bin)) => ctx.binary(bin),
Ok(ws::Message::Close(reason)) => {
ctx.close(reason);
ctx.stop();
}
_ => ctx.stop(),
}
}
}
impl MyWebSocket {
fn new() -> Self {
Self { hb: Instant::now() }
}
/// helper method that sends ping to client every second.
///
/// also this method checks heartbeats from client
fn hb(&self, ctx: &mut <Self as Actor>::Context) {
ctx.run_interval(HEARTBEAT_INTERVAL, |act, ctx| {
// check client heartbeats
if Instant::now().duration_since(act.hb) > CLIENT_TIMEOUT {
// heartbeat timed out
println!("Websocket Client heartbeat failed, disconnecting!");
// stop actor
ctx.stop();
// don't try to send a ping
return;
}
ctx.ping(b"");
});
}
}
#[actix_web::main]
async fn main() -> std::io::Result<()> {
std::env::set_var("RUST_LOG", "actix_server=info,actix_web=info");
env_logger::init();
HttpServer::new(|| {
App::new()
// enable logger
.wrap(middleware::Logger::default())
// websocket route
.service(web::resource("/ws/").route(web::get().to(ws_index)))
// static files
.service(fs::Files::new("/", "static/").index_file("index.html"))
})
// start http server on 127.0.0.1:8080
.bind("127.0.0.1:8080")?
.run()
.await
}
processrunner.rs
extern crate tokio;
use tokio::io::*;
use tokio::process::Command;
use std::process::Stdio;
//#[tokio::main]
pub async fn run_process(
text: String,
) -> std::result::Result<
tokio::io::Lines<BufReader<tokio::process::ChildStdout>>,
Box<dyn std::error::Error>,
> {
let mut cmd = Command::new(text);
cmd.stdout(Stdio::piped());
let mut child = cmd.spawn().expect("failed to spawn command");
let stdout = child
.stdout
.take()
.expect("child did not have a handle to stdout");
let lines = BufReader::new(stdout).lines();
// Ensure the child process is spawned in the runtime so it can
// make progress on its own while we await for any output.
tokio::spawn(async {
let status = child.await.expect("child process encountered an error");
println!("child status was: {}", status);
});
Ok(lines)
}
Error:
error[E0495]: cannot infer an appropriate lifetime due to conflicting requirements
--> src/main.rs:57:41
|
57 | let future = async move {
| _________________________________________^
58 | | let reader = processrunner::run_process(text).await;
59 | | let mut reader = reader.ok().unwrap();
60 | | while let Some(line) = reader.next_line().await.unwrap() {
... |
63 | | }
64 | | };
| |_________________^
|
note: first, the lifetime cannot outlive the anonymous lifetime #2 defined on the method body at 45:5...
--> src/main.rs:45:5
|
45 | / fn handle(&mut self, msg: Result<ws::Message, ws::ProtocolError>, ctx: &mut Self::Context) {
46 | | // process websocket messages
47 | | println!("WS: {:?}", msg);
48 | | match msg {
... |
74 | | }
75 | | }
| |_____^
note: ...so that the types are compatible
--> src/main.rs:57:41
|
57 | let future = async move {
| _________________________________________^
58 | | let reader = processrunner::run_process(text).await;
59 | | let mut reader = reader.ok().unwrap();
60 | | while let Some(line) = reader.next_line().await.unwrap() {
... |
63 | | }
64 | | };
| |_________________^
= note: expected `&mut actix_web_actors::ws::WebsocketContext<MyWebSocket>`
found `&mut actix_web_actors::ws::WebsocketContext<MyWebSocket>`
= note: but, the lifetime must be valid for the static lifetime...
note: ...so that the type `actix::fut::FutureWrap<impl std::future::Future, MyWebSocket>` will meet its required lifetime bounds
--> src/main.rs:66:41
|
66 | future.into_actor(self).spawn(ctx);
| ^^^^^
error: aborting due to previous error
For more information about this error, try `rustc --explain E0495`.
cargo
[package]
name = "removed"
version = "0.1.0"
authors = ["removed"]
edition = "2018"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
tokio = { version = "0.2", features = ["full"] }
actix = "0.10"
actix-codec = "0.3"
actix-web = "3"
actix-web-actors = "3"
actix-files = "0.3"
awc = "2"
env_logger = "0.7"
futures = "0.3.1"
bytes = "0.5.3"
Here are the basics. You may need to do a little work here and there but this works.
use actix::prelude::*;
use tokio::process::Command;
use actix_web::{ web, App, Error, HttpRequest, HttpResponse, HttpServer};
use actix_web_actors::ws;
use tokio::io::{ AsyncBufReadExt};
use actix::AsyncContext;
use tokio::stream::{ StreamExt};
use tokio::io::{BufReader};
use std::process::Stdio;
#[derive(Message)]
#[rtype(result = "Result<(), ()>")]
struct CommandRunner(String);
/// Define HTTP actor
struct MyWs;
impl Actor for MyWs {
type Context = ws::WebsocketContext<Self>;
}
#[derive(Debug)]
struct Line(String);
impl StreamHandler<Result<Line, ws::ProtocolError>> for MyWs {
fn handle(
&mut self,
msg: Result<Line, ws::ProtocolError>,
ctx: &mut Self::Context,
) {
match msg {
Ok(line) => ctx.text(line.0),
_ => () //Handle errors
}
}
}
/// Handler for ws::Message message
impl StreamHandler<Result<ws::Message, ws::ProtocolError>> for MyWs {
fn handle(
&mut self,
msg: Result<ws::Message, ws::ProtocolError>,
ctx: &mut Self::Context,
) {
match msg {
Ok(ws::Message::Ping(msg)) => ctx.pong(&msg),
Ok(ws::Message::Text(text)) => {
ctx.notify(CommandRunner(text.to_string()));
},
Ok(ws::Message::Binary(bin)) => ctx.binary(bin),
_ => (),
}
}
}
impl Handler<CommandRunner> for MyWs {
type Result = Result<(), ()>;
fn handle(&mut self, msg: CommandRunner, ctx: &mut Self::Context) -> Self::Result {
let mut cmd = Command::new(msg.0);
// Specify that we want the command's standard output piped back to us.
// By default, standard input/output/error will be inherited from the
// current process (for example, this means that standard input will
// come from the keyboard and standard output/error will go directly to
// the terminal if this process is invoked from the command line).
cmd.stdout(Stdio::piped());
let mut child = cmd.spawn()
.expect("failed to spawn command");
let stdout = child.stdout.take()
.expect("child did not have a handle to stdout");
let reader = BufReader::new(stdout).lines();
// Ensure the child process is spawned in the runtime so it can
// make progress on its own while we await for any output.
let fut = async move {
let status = child.await
.expect("child process encountered an error");
println!("child status was: {}", status);
};
let fut = actix::fut::wrap_future::<_, Self>(fut);
ctx.spawn(fut);
ctx.add_stream(reader.map(|l| Ok(Line(l.expect("Not a line")))));
Ok(())
}
}
async fn index(req: HttpRequest, stream: web::Payload) -> Result<HttpResponse, Error> {
let resp = ws::start(MyWs {}, &req, stream);
println!("{:?}", resp);
resp
}
#[actix_web::main]
async fn main() -> std::io::Result<()> {
HttpServer::new(|| App::new().route("/ws/", web::get().to(index)))
.bind("127.0.0.1:8080")?
.run()
.await
}
Running ls Looks like this.
So I just understood what was going wrong at the same time that I discovered the accepted answer.
The accepted answer proposes a clean solution but I thought I would pose an alternative view point, the code snippet I propose below makes fewer changes to my original attempt (as shown in the question) in the hope that it will demonstrate my fundamental miss understanding.
The fundamental issue with my code is that I was ignoring the rule that "every actor has its own context". As you see from the compile error in the question, luckily Actix uses the rust compiler to enforce this rule.
Now that I understand that, it looks like the wrong thing I was trying to do is to spawn another actor and have that actor somehow move/copy in the original actor's context, just so it could respond with the process output lines. There is no need to do this of course, because the Actor model is all about letting Actors communicate by messages.
Instead, when spawning a new actor, I should have passed it the address of the original actor, allowing the newly spawned actor to send updates back. The original actor handles these messages (struct Line below) using a handler.
As I said, the accepted answer also does this but using a mapper which looks like a more elegant solution than my loop.
mod processrunner;
use std::time::{Duration, Instant};
use actix::prelude::*;
use actix_files as fs;
use actix_web::{middleware, web, App, Error, HttpRequest, HttpResponse, HttpServer};
use actix_web_actors::ws;
/// How often heartbeat pings are sent
const HEARTBEAT_INTERVAL: Duration = Duration::from_secs(5);
/// How long before lack of client response causes a timeout
const CLIENT_TIMEOUT: Duration = Duration::from_secs(10);
/// do websocket handshake and start `MyWebSocket` actor
async fn ws_index(r: HttpRequest, stream: web::Payload) -> Result<HttpResponse, Error> {
println!("{:?}", r);
let res = ws::start(MyWebSocket::new(), &r, stream);
println!("{:?}", res);
res
}
/// websocket connection is long running connection, it easier
/// to handle with an actor
struct MyWebSocket {
/// Client must send ping at least once per 10 seconds (CLIENT_TIMEOUT),
/// otherwise we drop connection.
hb: Instant,
}
impl Actor for MyWebSocket {
type Context = ws::WebsocketContext<Self>;
/// Method is called on actor start. We start the heartbeat process here.
fn started(&mut self, ctx: &mut Self::Context) {
self.hb(ctx);
}
}
#[derive(Message)]
#[rtype(result = "()")]
pub struct Line {
line: String,
}
impl Handler<Line> for MyWebSocket {
type Result = ();
fn handle(&mut self, msg: Line, ctx: &mut Self::Context) {
ctx.text(msg.line);
}
}
/// Handler for `ws::Message`
impl StreamHandler<Result<ws::Message, ws::ProtocolError>> for MyWebSocket {
fn handle(&mut self, msg: Result<ws::Message, ws::ProtocolError>, ctx: &mut Self::Context) {
// process websocket messages
println!("WS: {:?}", msg);
match msg {
Ok(ws::Message::Ping(msg)) => {
self.hb = Instant::now();
ctx.pong(&msg);
}
Ok(ws::Message::Pong(_)) => {
self.hb = Instant::now();
}
Ok(ws::Message::Text(text)) => {
let recipient = ctx.address().recipient();
let future = async move {
let reader = processrunner::run_process(text).await;
let mut reader = reader.ok().unwrap();
while let Some(line) = reader.next_line().await.unwrap() {
println!("line = {}", line);
recipient.do_send(Line { line });
}
};
future.into_actor(self).spawn(ctx);
}
Ok(ws::Message::Binary(bin)) => ctx.binary(bin),
Ok(ws::Message::Close(reason)) => {
ctx.close(reason);
ctx.stop();
}
_ => ctx.stop(),
}
}
}
impl MyWebSocket {
fn new() -> Self {
Self { hb: Instant::now() }
}
/// helper method that sends ping to client every second.
///
/// also this method checks heartbeats from client
fn hb(&self, ctx: &mut <Self as Actor>::Context) {
ctx.run_interval(HEARTBEAT_INTERVAL, |act, ctx| {
// check client heartbeats
if Instant::now().duration_since(act.hb) > CLIENT_TIMEOUT {
// heartbeat timed out
println!("Websocket Client heartbeat failed, disconnecting!");
// stop actor
ctx.stop();
// don't try to send a ping
return;
}
ctx.ping(b"");
});
}
}
#[actix_web::main]
async fn main() -> std::io::Result<()> {
std::env::set_var("RUST_LOG", "actix_server=info,actix_web=info");
env_logger::init();
HttpServer::new(|| {
App::new()
// enable logger
.wrap(middleware::Logger::default())
// websocket route
.service(web::resource("/ws/").route(web::get().to(ws_index)))
// static files
.service(fs::Files::new("/", "static/").index_file("index.html"))
})
// start http server on 127.0.0.1:8080
.bind("127.0.0.1:8080")?
.run()
.await
}

How to asynchronously explore a directory and its sub-directories?

I need to explore a directory and all its sub-directories. I can explore the directory easily with recursion in a synchronous way:
use failure::Error;
use std::fs;
use std::path::Path;
fn main() -> Result<(), Error> {
visit(Path::new("."))
}
fn visit(path: &Path) -> Result<(), Error> {
for e in fs::read_dir(path)? {
let e = e?;
let path = e.path();
if path.is_dir() {
visit(&path)?;
} else if path.is_file() {
println!("File: {:?}", path);
}
}
Ok(())
}
When I try to do the same in an asynchronous manner using tokio_fs:
use failure::Error; // 0.1.6
use futures::Future; // 0.1.29
use std::path::PathBuf;
use tokio::{fs, prelude::*}; // 0.1.22
fn visit(path: PathBuf) -> impl Future<Item = (), Error = Error> {
let task = fs::read_dir(path)
.flatten_stream()
.for_each(|entry| {
println!("{:?}", entry.path());
let path = entry.path();
if path.is_dir() {
let task = visit(entry.path());
tokio::spawn(task.map_err(drop));
}
future::ok(())
})
.map_err(Error::from);
task
}
Playground
I get the following error:
error[E0391]: cycle detected when processing `visit::{{opaque}}#0`
--> src/lib.rs:6:28
|
6 | fn visit(path: PathBuf) -> impl Future<Item = (), Error = Error> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
note: ...which requires processing `visit`...
--> src/lib.rs:6:1
|
6 | fn visit(path: PathBuf) -> impl Future<Item = (), Error = Error> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
= note: ...which requires evaluating trait selection obligation `futures::future::map_err::MapErr<impl futures::future::Future, fn(failure::error::Error) {std::mem::drop::<failure::error::Error>}>: std::marker::Send`...
= note: ...which again requires processing `visit::{{opaque}}#0`, completing the cycle
note: cycle used when checking item types in top-level module
--> src/lib.rs:1:1
|
1 | / use failure::Error; // 0.1.6
2 | | use futures::Future; // 0.1.29
3 | | use std::path::PathBuf;
4 | | use tokio::{fs, prelude::*}; // 0.1.22
... |
20| | task
21| | }
| |_^
error[E0391]: cycle detected when processing `visit::{{opaque}}#0`
--> src/lib.rs:6:28
|
6 | fn visit(path: PathBuf) -> impl Future<Item = (), Error = Error> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
note: ...which requires processing `visit`...
--> src/lib.rs:6:1
|
6 | fn visit(path: PathBuf) -> impl Future<Item = (), Error = Error> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
= note: ...which again requires processing `visit::{{opaque}}#0`, completing the cycle
note: cycle used when checking item types in top-level module
--> src/lib.rs:1:1
|
1 | / use failure::Error; // 0.1.6
2 | | use futures::Future; // 0.1.29
3 | | use std::path::PathBuf;
4 | | use tokio::{fs, prelude::*}; // 0.1.22
... |
20| | task
21| | }
| |_^
What is the correct way of exploring a directory and its sub-directories asynchronously while propagating all the errors?
I would make several modifications to rodrigo's existing answer:
Return a Stream from the function, allowing the caller to do what they need with a given file entry.
Return an impl Stream instead of a Box<dyn Stream>. This leaves room for more flexibility in implementation. For example, a custom type could be created that uses an internal stack instead of the less-efficient recursive types.
Return io::Error from the function to allow the user to deal with any errors.
Accept a impl Into<PathBuf> to allow a nicer API.
Create an inner hidden implementation function that uses concrete types in its API.
Futures 0.3 / Tokio 0.2
In this version, I avoided the deeply recursive calls, keeping a local stack of paths to visit (to_visit).
use futures::{stream, Stream, StreamExt}; // 0.3.1
use std::{io, path::PathBuf};
use tokio::fs::{self, DirEntry}; // 0.2.4
fn visit(path: impl Into<PathBuf>) -> impl Stream<Item = io::Result<DirEntry>> + Send + 'static {
async fn one_level(path: PathBuf, to_visit: &mut Vec<PathBuf>) -> io::Result<Vec<DirEntry>> {
let mut dir = fs::read_dir(path).await?;
let mut files = Vec::new();
while let Some(child) = dir.next_entry().await? {
if child.metadata().await?.is_dir() {
to_visit.push(child.path());
} else {
files.push(child)
}
}
Ok(files)
}
stream::unfold(vec![path.into()], |mut to_visit| {
async {
let path = to_visit.pop()?;
let file_stream = match one_level(path, &mut to_visit).await {
Ok(files) => stream::iter(files).map(Ok).left_stream(),
Err(e) => stream::once(async { Err(e) }).right_stream(),
};
Some((file_stream, to_visit))
}
})
.flatten()
}
#[tokio::main]
async fn main() {
let root_path = std::env::args().nth(1).expect("One argument required");
let paths = visit(root_path);
paths
.for_each(|entry| {
async {
match entry {
Ok(entry) => println!("visiting {:?}", entry),
Err(e) => eprintln!("encountered an error: {}", e),
}
}
})
.await;
}
Futures 0.1 / Tokio 0.1
use std::path::PathBuf;
use tokio::{fs, prelude::*}; // 0.1.22
use tokio_fs::DirEntry; // 1.0.6
fn visit(
path: impl Into<PathBuf>,
) -> impl Stream<Item = DirEntry, Error = std::io::Error> + Send + 'static {
fn visit_inner(
path: PathBuf,
) -> Box<dyn Stream<Item = DirEntry, Error = std::io::Error> + Send + 'static> {
Box::new({
fs::read_dir(path)
.flatten_stream()
.map(|entry| {
let path = entry.path();
if path.is_dir() {
// Optionally include `entry` if you want to
// include directories in the resulting
// stream.
visit_inner(path)
} else {
Box::new(stream::once(Ok(entry)))
}
})
.flatten()
})
}
visit_inner(path.into())
}
fn main() {
tokio::run({
let root_path = std::env::args().nth(1).expect("One argument required");
let paths = visit(root_path);
paths
.then(|entry| {
match entry {
Ok(entry) => println!("visiting {:?}", entry),
Err(e) => eprintln!("encountered an error: {}", e),
};
Ok(())
})
.for_each(|_| Ok(()))
});
}
See also:
How do I synchronously return a value calculated in an asynchronous Future in stable Rust?
Your code has two errors:
First, a function returning impl Trait cannot currently be recursive, because the actual type returned would depend on itself.
To make your example work, you need to return a sized type. The simple candidate is a trait object, that is, a Box<dyn Future<...>>:
fn visit(path: PathBuf) -> Box<dyn Future<Item = (), Error = Error>> {
// ...
let task = visit(entry.path());
tokio::spawn(task.map_err(drop));
// ...
Box::new(task)
}
There is still your second error:
error[E0277]: `dyn futures::future::Future<Item = (), Error = failure::error::Error>` cannot be sent between threads safely
--> src/lib.rs:14:30
|
14 | tokio::spawn(task.map_err(drop));
| ^^^^^^^^^^^^^^^^^^ `dyn futures::future::Future<Item = (), Error = failure::error::Error>` cannot be sent between threads safely
|
::: /root/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.1.22/src/executor/mod.rs:131:52
|
131 | where F: Future<Item = (), Error = ()> + 'static + Send
| ---- required by this bound in `tokio::executor::spawn`
|
= help: the trait `std::marker::Send` is not implemented for `dyn futures::future::Future<Item = (), Error = failure::error::Error>`
= note: required because of the requirements on the impl of `std::marker::Send` for `std::ptr::Unique<dyn futures::future::Future<Item = (), Error = failure::error::Error>>`
= note: required because it appears within the type `std::boxed::Box<dyn futures::future::Future<Item = (), Error = failure::error::Error>>`
= note: required because it appears within the type `futures::future::map_err::MapErr<std::boxed::Box<dyn futures::future::Future<Item = (), Error = failure::error::Error>>, fn(failure::error::Error) {std::mem::drop::<failure::error::Error>}>`
This means that your trait object is not Send so it cannot be scheduled for execution in another thread using tokio::spawn(). Fortunately, this is easy to fix: just add + Send to your trait object:
fn visit(path: PathBuf) -> Box<dyn Future<Item = (), Error = Error> + Send> {
//...
}
See the full code in the Playground.

How do I convert an iterator into a stream on success or an empty stream on failure?

I'd like to take a regular iterator and turn it into a stream so that I can do further stream processing. The trouble is that I may have an iterator or an error to deal with. I think I'm pretty close with this:
#[macro_use]
extern crate log;
extern crate futures; // 0.1.21
extern crate tokio;
use futures::prelude::*;
use futures::{future, stream};
use std::fmt::Debug;
use std::net::{SocketAddr, ToSocketAddrs};
fn resolve(addrs: impl ToSocketAddrs + Debug) -> impl Stream<Item = SocketAddr, Error = ()> {
match addrs.to_socket_addrs() {
Ok(iter) => stream::unfold(iter, |iter| match iter.next() {
Some(a) => Some(future::ok((a, iter))),
None => None,
}),
Err(e) => {
error!("could not resolve socket addresses {:?}: {:?}", addrs, e);
stream::empty()
}
}
}
fn main() {
let task = resolve("1.2.3.4:12345")
.map_err(|e| error!("{:?}", e))
.for_each(|addr| info!("{:?}", addr))
.fold();
tokio::run(task);
}
playground
error[E0308]: match arms have incompatible types
--> src/main.rs:12:5
|
12 | / match addrs.to_socket_addrs() {
13 | | Ok(iter) => stream::unfold(iter, |iter| match iter.next() {
14 | | Some(a) => Some(future::ok((a, iter))),
15 | | None => None,
... |
20 | | }
21 | | }
| |_____^ expected struct `futures::stream::Unfold`, found struct `futures::stream::Empty`
|
= note: expected type `futures::stream::Unfold<<impl ToSocketAddrs + Debug as std::net::ToSocketAddrs>::Iter, [closure#src/main.rs:13:42: 16:10], futures::FutureResult<(std::net::SocketAddr, <impl ToSocketAddrs + Debug as std::net::ToSocketAddrs>::Iter), _>>`
found type `futures::stream::Empty<_, _>`
note: match arm with an incompatible type
--> src/main.rs:17:19
|
17 | Err(e) => {
| ___________________^
18 | | error!("could not resolve socket addresses {:?}: {:?}", addrs, e);
19 | | stream::empty()
20 | | }
| |_________^
error[E0277]: the trait bound `(): futures::Future` is not satisfied
--> src/main.rs:27:10
|
27 | .for_each(|addr| info!("{:?}", addr))
| ^^^^^^^^ the trait `futures::Future` is not implemented for `()`
|
= note: required because of the requirements on the impl of `futures::IntoFuture` for `()`
error[E0599]: no method named `fold` found for type `futures::stream::ForEach<futures::stream::MapErr<impl futures::Stream, [closure#src/main.rs:26:18: 26:39]>, [closure#src/main.rs:27:19: 27:45], ()>` in the current scope
--> src/main.rs:28:10
|
28 | .fold();
| ^^^^
|
= note: the method `fold` exists but the following trait bounds were not satisfied:
`&mut futures::stream::ForEach<futures::stream::MapErr<impl futures::Stream, [closure#src/main.rs:26:18: 26:39]>, [closure#src/main.rs:27:19: 27:45], ()> : futures::Stream`
`&mut futures::stream::ForEach<futures::stream::MapErr<impl futures::Stream, [closure#src/main.rs:26:18: 26:39]>, [closure#src/main.rs:27:19: 27:45], ()> : std::iter::Iterator`
The hint is pretty obvious. The two Results I'm returning from the match differ and should be the same. Now, how can I do that so that I return a stream?
Rust is a statically typed language which means that the return type of a function has to be a single type, known at compile time. You are attempting to return multiple types, decided at runtime.
The closest solution to your original is to always return the Unfold stream:
fn resolve(addrs: impl ToSocketAddrs) -> impl Stream<Item = SocketAddr, Error = ()> {
stream::unfold(addrs.to_socket_addrs(), |r| {
match r {
Ok(mut iter) => iter.next().map(|addr| future::ok((addr, Ok(iter)))),
Err(_) => None,
}
})
}
But why reinvent the wheel?
futures::stream::iter_ok
Converts an Iterator into a Stream which is always ready to yield the next value.
Subsequent versions of the futures crate implement Stream for Either, which makes this very elegant:
fn resolve(addrs: impl ToSocketAddrs) -> impl Stream<Item = SocketAddr, Error = ()> {
match addrs.to_socket_addrs() {
Ok(iter) => stream::iter_ok(iter).left_stream(),
Err(_) => stream::empty().right_stream(),
}
}
It's straightforward to backport this functionality to futures 0.1 (maybe someone should submit it as a PR for those who are stuck on 0.1...):
enum MyEither<L, R> {
Left(L),
Right(R),
}
impl<L, R> Stream for MyEither<L, R>
where
L: Stream,
R: Stream<Item = L::Item, Error = L::Error>,
{
type Item = L::Item;
type Error = L::Error;
fn poll(&mut self) -> Poll<Option<Self::Item>, Self::Error> {
match self {
MyEither::Left(l) => l.poll(),
MyEither::Right(r) => r.poll(),
}
}
}
trait EitherStreamExt {
fn left_stream<R>(self) -> MyEither<Self, R>
where
Self: Sized;
fn right_stream<L>(self) -> MyEither<L, Self>
where
Self: Sized;
}
impl<S: Stream> EitherStreamExt for S {
fn left_stream<R>(self) -> MyEither<Self, R> {
MyEither::Left(self)
}
fn right_stream<L>(self) -> MyEither<L, Self> {
MyEither::Right(self)
}
}
Even better, use the fact that Result is an iterator and Stream::flatten exists:
fn resolve(addrs: impl ToSocketAddrs) -> impl Stream<Item = SocketAddr, Error = ()> {
stream::iter_ok(addrs.to_socket_addrs())
.map(stream::iter_ok)
.flatten()
}
Or if you really want to print errors:
fn resolve(addrs: impl ToSocketAddrs) -> impl Stream<Item = SocketAddr, Error = ()> {
stream::once(addrs.to_socket_addrs())
.map(stream::iter_ok)
.map_err(|e| eprintln!("err: {}", e))
.flatten()
}
See also:
Conditionally return empty iterator from flat_map
Conditionally iterate over one of several possible iterators
What is the correct way to return an Iterator (or any other trait)?

Resources