Understanding how to implement a Wrapper type for a Stream - asynchronous

I was wondering if anyone could give me any pointers on the best way to go about handling wake ups when writing a wrapper for a Stream.
For context I've got a Byte stream coming in via a HTTP request (using reqwest) and I'm doing some filtering and mapping on that stream to handle validation and deserialization. Effectively whenever the inner stream produces a value I want this stream to (potentially) emit a value.
** Edit **
An additional caveat is the stream needs to also hold a small amount of state (A Vec<String>) that it needs to be able to reference on each poll - (the columns property)
The Solution
This turned out to be me just not understanding how the stream was working under the hood. Rodrigo's answer below was completely correct. I did just need to return Poll::Pending from the inner stream, however I was making the mistake of matching on that and returning my own Poll::Pending which was why the stream wasn't being appropriately woken up.
If it's useful to anyone, instead of matching on the output of inner_stream.poll_next(), I ended up just mapping the Some value and returning that to ensure that I was building off the Polls of the inner stream eg:
return Pin::new(&mut this.stream).poll_next(cx).map(|data| { ... })
Thanks for everyone who commented and helped out!
Context for the original question
The wrapper type:
pin_project! {
#[derive(Default)]
struct QueryStream<T, S> where S: Stream, T: DeserializeOwned {
columns: Vec<String>,
#[pin]
stream: S,
has_closed: bool,
_marker: PhantomData<T>
}
}
The only implementation of Stream that I've managed to get to work on the wrapper type is one that spins on the inner stream when it returns Poll::Pending. This doesn't seem ideal though as I believe it would block until a value is emitted?
impl<T, S> Stream for QueryStream<T, S>
where
T: DeserializeOwned,
S: Stream<Item = std::result::Result<Bytes, reqwest::Error>>,
{
type Item = Result<T>;
fn poll_next(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Option<Self::Item>> {
let mut this = self.project();
loop {
if *this.has_closed {
return Poll::Ready(None);
}
match Pin::new(&mut this.stream).poll_next(cx) {
Poll::Ready(Some(data)) => {
// Parsing Logic Here
return Poll::Ready(Some(Ok::<_, Error>(resp)));
}
Poll::Ready(None) => return Poll::Ready(None),
Poll::Pending => {}
}
}
}
}
Trying to remove the loop (and changing the Poll::Pending match arm to Poll::Pending => Poll::Pending) generally results in poll only being called once before hanging, from my very rough understanding of why this is, it's because I'm dropping the reference to the waker when I return from this function, as it's not stored anywhere.
However I'm struggling to work out how to arrange my struct/code to enable the storage of that reference or alternatively what the best way to enable the use of that waker is? Is anyone able to explain how this problem can be solved?
Many thanks in advance!

Related

Allow a future to store a pointer to a pinned value in its container

Prelude
I have been working on this segment of code that attempts to provide a recyclable API for implementing an asynchronous stream for a REST paginator.
I have gone through many iterations and settled on storing state in an enumerable that describes at what point the process is at, both because I feel that it is the best fit for this purpose and also because it is something to learn from, being especially explicit about the whole process. I do not want to use stream! or try_stream! from the async-stream crate.
The state begins at Begin, and moves a PaginationDelegate into the next state after using it to make a request. This state is Pending and owns the delegate and a future that is returned from PaginationDelegate::next_page.
The issue appears when the next_page method needs a reference, &self, but the self is not stored on the stack frame of the future that is stored within the Pending state.
I wanted to keep this "flat" because I find the algorithm easier to follow, but I also wanted to learn how to create this self-referential structure the most correct way. I am aware that I can wrap the future and have it own the PaginationDelegate, and indeed this may be the method I end up using. Nevertheless, I want to know how I could move the two values into the same holding structure and keep the pointer alive for my own education.
Delegate Trait
Here a PaginationDelegate is defined. This trait is intended to be implemented and used by any method for function that intends to return a PaginatedStream or dyn Stream. Its purpose is to define how the requests will be made, as well as store a limited subset of the state (the offset for the next page from the REST API, and the total number of items that are expected from the API).
#[async_trait]
pub trait PaginationDelegate {
type Item;
type Error;
/// Performs an asynchronous request for the next page and returns either
/// a vector of the result items or an error.
async fn next_page(&self) -> Result<Vec<Self::Item>, Self::Error>;
/// Gets the current offset, which will be the index at the end of the
/// current/previous page. The value returned from this will be changed by
/// [`PaginatedStream`] immediately following a successful call to
/// [`next_page()`], increasing by the number of items returned.
fn offset(&self) -> usize;
/// Sets the offset for the next page. The offset is required to be the
/// index of the last item from the previous page.
fn set_offset(&mut self, value: usize);
/// Gets the total count of items that are currently expected from the API.
/// This may change if the API returns a different number of results on
/// subsequent pages, and may be less than what the API claims in its
/// response data if the API has a maximum limit.
fn total_items(&self) -> Option<usize>;
}
Stream State
The next segment is the enum itself, which serves as the implimentor for Stream and the holder for the current state of the iterator.
Note that currently the Pending variant has the delegate and the future separate. I could have used future: Pin<Box<dyn Future<Output = Result<(D, Vec<D::Item>), D::Error>>>> to keep the delegate inside of the Future but prefer not to because I want to solve the underlying problem and not gloss over it. Also, the delegate field is a Pin<Box<D>> because I was experimenting and I feel that this is the closest I have gotten to a correct solution.
pub enum PaginatedStream<D: PaginationDelegate> {
Begin {
delegate: D,
},
Pending {
delegate: Pin<Box<D>>,
#[allow(clippy::type_complexity)]
future: Pin<Box<dyn Future<Output = Result<Vec<D::Item>, D::Error>>>>,
},
Ready {
delegate: D,
items: VecDeque<D::Item>,
},
Closed,
Indeterminate,
}
Stream Implementation
Last part is the implementation of Stream. This is incomplete for two reasons; I have not finished it, and it would be best to keep the example short and minimal.
impl<D: 'static> Stream for PaginatedStream<D>
where
D: PaginationDelegate + Unpin,
D::Item: Unpin,
{
// If the state is `Pending` and the future resolves to an `Err`, that error is
// forwarded only once and the state set to `Closed`. If there is at least one
// result to return, the `Ok` variant is, of course, used instead.
type Item = Result<D::Item, D::Error>;
fn poll_next(mut self: Pin<&mut Self>, ctx: &mut Context<'_>) -> Poll<Option<Self::Item>> {
// Avoid using the full namespace to match all variants.
use PaginatedStream::*;
// Take ownership of the current state (`self`) and replace it with the
// `Indeterminate` state until the new state is in fact determined.
let this = std::mem::replace(&mut *self, Indeterminate);
match this {
// This state only occurs at the entry of the state machine. It only holds the
// `PaginationDelegate` that will be used to update the offset and make new requests.
Begin { delegate } => {
// Pin the delegate to the heap to ensure that it doesn't move and that pointers
// remain valid even after moving the value into the new state.
let delegate = Box::pin(delegate);
// Set the current state to `Pending`, after making the next request using the
// pinned delegate.
self.set(Pending {
delegate,
future: PaginationDelegate::next_page(delegate.as_ref()),
});
// Return the distilled verson of the new state to the callee, indicating that a
// new request has been made and we are waiting or new data.
Poll::Pending
}
// At some point in the past this stream was polled and made a new request. Now it is
// time to poll the future returned from that request that was made, and if results are
// available, unpack them to the `Ready` state and move the delegate. If the future
// still doesn't have results, set the state back to `Pending` and move the fields back
// into position.
Pending { delegate, future } => todo!(),
// The request has resolved with data in the past, and there are items ready for us to
// provide the callee. In the event that there are no more items in the `VecDeque`, we
// will make the next request and construct the state for `Pending` again.
Ready { delegate, items } => todo!(),
// Either an error has occurred, or the last item has been yielded already. Nobody
// should be polling anymore, but to be nice, just tell them that there are no more
// results with `Poll::Ready(None)`.
Closed => Poll::Ready(None),
// The `Indeterminate` state should have only been used internally and reset back to a
// valid state before yielding the `Poll` to the callee. This branch should never be
// reached, if it is, that is a panic.
Indeterminate => unreachable!(),
}
}
}
Compiler Messages
At the moment, in the Begin branch, there are two compiler messages where the borrow to the delegate (delegate.as_ref()) is taken and passed to the PaginationDelegate::next_page method.
The first is that the delegate does not live long enough, because the pinned value is moved into the new state variant Pending, and no longer resides at the position it was assigned. I do not understand why the compiler wants this to exist for 'static though, and would appreciate if this could be explained.
error[E0597]: `delegate` does not live long enough
--> src/lib.rs:90:59
|
90 | future: PaginationDelegate::next_page(delegate.as_ref()),
| ------------------------------^^^^^^^^^^^^^^^^^-
| | |
| | borrowed value does not live long enough
| cast requires that `delegate` is borrowed for `'static`
...
96 | }
| - `delegate` dropped here while still borrowed
I would also like to hear any methods you have for creating the values for fields of a struct that rely on data that should be moved into the struct (self-referential, the main issue of this entire post). I know it is wrong (and impossible) to use MaybeUninit here because any placeholder value that would later be dropped will cause undefined behavior. Possibly show me a method for allocating a structure of uninitialized memory and then overwriting those fields with values after they have been constructed, without letting the compiler attempt to free the uninitialized memory.
The second compiler message is as follows, which is similar to the first except that the temporary value for delegate is moved into the struct. I am to understand that this is fundamentally the same issue described above, but just explained differently by two separate heuristics. Is my understanding wrong?
error[E0382]: borrow of moved value: `delegate`
--> src/lib.rs:90:59
|
84 | let delegate = Box::pin(delegate);
| -------- move occurs because `delegate` has type `Pin<Box<D>>`, which does not implement the `Copy` trait
...
89 | delegate,
| -------- value moved here
90 | future: PaginationDelegate::next_page(delegate.as_ref()),
| ^^^^^^^^^^^^^^^^^ value borrowed here after move
Environment
This is real code but is already a MCVE I believe.
To set up the environment for this, the crate dependencies are as follows.
[dependencies]
futures-core = "0.3"
async-trait = "0.1"
And the imports that are used in the code,
use std::collections::VecDeque;
use std::pin::Pin;
use std::task::{Context, Poll};
use async_trait::async_trait;
use futures_core::{Future, Stream};
The potential solution that I did not want to use, because it hides the underlying issue (or rather avoids the intent of this question entirely) follows.
Where the PaginatedStream enumerable is defined, change the Pending to the following.
Pending {
#[allow(clippy::type_complexity)]
future: Pin<Box<dyn Future<Output = Result<(D, Vec<D::Item>), D::Error>>>>,
},
Now, inside the implementation of Stream change the matching arm for Begin to the following.
// This state only occurs at the entry of the state machine. It only holds the
// `PaginationDelegate` that will be used to update the offset and make new requests.
Begin { delegate } => {
self.set(Pending {
// Construct a new future that awaits the result and has a new type for `Output`
// that contains both the result and the moved delegate.
// Here the delegate is moved into the future via the `async` block.
future: Box::pin(async {
let result = delegate.next_page().await;
result.map(|items| (delegate, items))
}),
});
// Return the distilled verson of the new state to the callee, indicating that a
// new request has been made and we are waiting or new data.
Poll::Pending
}
The compiler knows that that async block is really async move, you could be more explicit if you wanted. This effectively moves the delegate into the stack frame of the future that is boxed and pinned, ensuring that whenever the value is moved in memory the two values move together and the pointer cannot be invalidated.
The other matching arm for Pending needs to be updated to reflect the change in signature. Here is a complete implementation of the logic.
// At some point in the past this stream was polled and asked the delegate to make a new
// request. Now it is time to poll the future returned from that request that was made,
// and if results are available, unpack them to the `Ready` state and move
// the delegate. If the future still doesn't have results, set the state
// back to `Pending` and move the fields back into position.
Pending { mut future } => match future.as_mut().poll(ctx) {
// The future from the last request returned successfully with new items,
// and gave the delegate back.
Poll::Ready(Ok((mut delegate, items))) => {
// Tell the delegate the offset for the next page, which is the sum of the old
// old offset and the number of items that the API sent back.
delegate.set_offset(delegate.offset() + items.len());
// Construct a new `VecDeque` so that the items can be popped from the front.
// This should be more efficient than reversing the `Vec`, and less confusing.
let mut items = VecDeque::from(items);
// Get the first item out so that it can be yielded. The event that there are no
// more items should have been handled by the `Ready` branch, so it should be
// safe to unwrap.
let popped = items.pop_front().unwrap();
// Set the new state to `Ready` with the delegate and the items.
self.set(Ready { delegate, items });
Poll::Ready(Some(Ok(popped)))
}
// The future from the last request returned with an error.
Poll::Ready(Err(error)) => {
// Set the state to `Closed` so that any future polls will return
// `Poll::Ready(None)`. The callee can even match against this if needed.
self.set(Closed);
// Forward the error to whoever polled. This will only happen once because the
// error is moved, and the state set to `Closed`.
Poll::Ready(Some(Err(error)))
}
// The future from the last request is still pending.
Poll::Pending => {
// Because the state is currently `Indeterminate` it must be set back to what it
// was. This will move the future back into the state.
self.set(Pending { future });
// Tell the callee that we are still waiting for a response.
Poll::Pending
}
},

Rust ownership issues

I'm quite new to Rust, I'm mainly a C#, javascript and python developer, so I like to approach things in a OOP way, however I still can't wrap my head around ownership in rust. Especially when it comes to OOP.
I'm writing a TCP server. I have a struct that contains connections (streams) and I read the sockets asynchronously using the mio crate. I understand what the error is telling me, but I have no clue how to fix it. I tried changing the read_message method into a function (without the reference to self), which worked, but the problem with this is that I'll need to access the connections and whatnot from the struct (to relay messages between sockets for example), so this workaround won't be plausible in later versions. Is there an easy fix for this, or is the design inherently flawed?
Here's a snippet that shows what my problem is:
let sock = self.connections.get_mut(&token).unwrap();
loop {
match sock.read(&mut msg_type) {
Ok(_) => {
self.read_message(msg_type[0], token);
}
}
}
fn read_message(&mut self, msg_type: u8, token: Token) {
let sock = self.connections.get_mut(&token).unwrap();
let msg_type = num::FromPrimitive::from_u8(msg_type);
match msg_type {
Some(MsgType::RequestIps) => {
let decoded: MsgTypes::Announce = bincode::deserialize_from(sock).unwrap();
println!("Public Key: {}", decoded.public_key);
}
_ => unreachable!()
}
}
And the error I'm getting is the following:
You are holding a mutable borrow on sock, which is part of self, at the moment you try to call self.read_message. Since you indicated that read_message needs mutable access to all of self, you need to make sure you don't have a mutable borrow on sock anymore at that point.
Fortunately, thanks to non-lexical lifetimes in Rust 2018, that's not hard to do; simply fetch sock inside the loop:
loop {
let sock = self.connections.get_mut(&token).unwrap();
match sock.read(&mut msg_type) {
Ok(_) => {
self.read_message(msg_type[0], token);
}
}
}
Assuming sock.read doesn't return anything that holds a borrow on sock, this should let the mutable borrow on sock be released before calling self.read_message. It needs to be re-acquired in the next iteration, but seeing as you're doing network I/O, the relative performance penalty of a single HashMap (?) access should be negligible.
(Due to lack of a minimal, compileable example, I wasn't able to test this.)

How to use Rust futures in callbacks?

Is there any way to use futures in callbacks? For example...
// Send message on multiple channels while removing ones that are closed.
use smol::channel::Sender;
...
// (expecting bool, found opaque type)
vec_of_sender.retain( |sender| async {
sender.send(msg.clone()).await.is_ok()
});
My work-around is to loop twice: On the first pass I delete closed senders (non-async) and on the second I do the actual send (async using for sender in ...). But it seems like I should be able to do it all in a single retain() call.
You can't use retain in this way. The closure that retain accepts must implement FnMut(&T) -> bool, but every async function returns an implementation of Future.
You can turn an async function into a synchronous one by blocking on it. For example, if you were using tokio, you could do this:
use tokio::runtime::Runtime;
let rt = Runtime::new().unwrap();
vec_of_sender.retain(|sender| {
rt.block_on(async { sender.send().await.is_ok() })
});
However, there is overhead to adding an async runtime, and I have a feeling that you are trying to solve the wrong problem.
The closure passed to retain must return a bool, but every async function returns impl Future. Instead, you can use Stream, which is the asynchronous version of Iterator. You can convert the vector into a Stream:
let stream = stream::iter(vec_of_sender);
And then use the filter method, which accepts an asynchronous closure and returns a new Stream:
let vec_of_sender = stream.filter(|sender| async {
sender.send(msg.clone()).await.is_ok()
}).collect::<Vec<Sender>>();
To avoid creating a new Vec, you can also use swap_remove:
let mut i = 0usize;
while i < vec_of_sender.len() {
if vec_of_sender[i].send(msg.clone()).await.is_ok() {
i += 1;
} else {
vec_of_sender.swap_remove(i);
}
}
Note that this will change the order of the vector.

How do I execute an async/await function without using any external dependencies?

I am attempting to create simplest possible example that can get async fn hello() to eventually print out Hello World!. This should happen without any external dependency like tokio, just plain Rust and std. Bonus points if we can get it done without ever using unsafe.
#![feature(async_await)]
async fn hello() {
println!("Hello, World!");
}
fn main() {
let task = hello();
// Something beautiful happens here, and `Hello, World!` is printed on screen.
}
I know async/await is still a nightly feature, and it is subject to change in the foreseeable future.
I know there is a whole lot of Future implementations, I am aware of the existence of tokio.
I am just trying to educate myself on the inner workings of standard library futures.
My helpless, clumsy endeavours
My vague understanding is that, first off, I need to Pin task down. So I went ahead and
let pinned_task = Pin::new(&mut task);
but
the trait `std::marker::Unpin` is not implemented for `std::future::GenFuture<[static generator#src/main.rs:7:18: 9:2 {}]>`
so I thought, of course, I probably need to Box it, so I'm sure it won't move around in memory. Somewhat surprisingly, I get the same error.
What I could get so far is
let pinned_task = unsafe {
Pin::new_unchecked(&mut task)
};
which is obviously not something I should do. Even so, let's say I got my hands on the Pinned Future. Now I need to poll() it somehow. For that, I need a Waker.
So I tried to look around on how to get my hands on a Waker. On the doc it kinda looks like the only way to get a Waker is with another new_unchecked that accepts a RawWaker. From there I got here and from there here, where I just curled up on the floor and started crying.
This part of the futures stack is not intended to be implemented by many people. The rough estimate that I have seen in that maybe there will be 10 or so actual implementations.
That said, you can fill in the basic aspects of an executor that is extremely limited by following the function signatures needed:
async fn hello() {
println!("Hello, World!");
}
fn main() {
drive_to_completion(hello());
}
use std::{
future::Future,
ptr,
task::{Context, Poll, RawWaker, RawWakerVTable, Waker},
};
fn drive_to_completion<F>(f: F) -> F::Output
where
F: Future,
{
let waker = my_waker();
let mut context = Context::from_waker(&waker);
let mut t = Box::pin(f);
let t = t.as_mut();
loop {
match t.poll(&mut context) {
Poll::Ready(v) => return v,
Poll::Pending => panic!("This executor does not support futures that are not ready"),
}
}
}
type WakerData = *const ();
unsafe fn clone(_: WakerData) -> RawWaker {
my_raw_waker()
}
unsafe fn wake(_: WakerData) {}
unsafe fn wake_by_ref(_: WakerData) {}
unsafe fn drop(_: WakerData) {}
static MY_VTABLE: RawWakerVTable = RawWakerVTable::new(clone, wake, wake_by_ref, drop);
fn my_raw_waker() -> RawWaker {
RawWaker::new(ptr::null(), &MY_VTABLE)
}
fn my_waker() -> Waker {
unsafe { Waker::from_raw(my_raw_waker()) }
}
Starting at Future::poll, we see we need a Pinned future and a Context. Context is created from a Waker which needs a RawWaker. A RawWaker needs a RawWakerVTable. We create all of those pieces in the simplest possible ways:
Since we aren't trying to support NotReady cases, we never need to actually do anything for that case and can instead panic. This also means that the implementations of wake can be no-ops.
Since we aren't trying to be efficient, we don't need to store any data for our waker, so clone and drop can basically be no-ops as well.
The easiest way to pin the future is to Box it, but this isn't the most efficient possibility.
If you wanted to support NotReady, the simplest extension is to have a busy loop, polling forever. A slightly more efficient solution is to have a global variable that indicates that someone has called wake and block on that becoming true.

Can this Rust code be written without the "match" statement?

linuxfood has created bindings for sqlite3, for which I am thankful. I'm just starting to learn Rust (0.8), and I'm trying to understand exactly what this bit of code is doing:
extern mod sqlite;
fn db() {
let database =
match sqlite::open("test.db") {
Ok(db) => db,
Err(e) => {
println(fmt!("Error opening test.db: %?", e));
return;
}
};
I do understand basically what it is doing. It is attempting to obtain a database connection and also testing for an error. I don't understand exactly how it is doing that.
In order to better understand it, I wanted to rewrite it without the match statement, but I don't have the knowledge to do that. Is that possible? Does sqlite::open() return two variables, or only one?
How can this example be written differently without the match statement? I'm not saying that is necessary or preferable, however it may help me to learn the language.
The outer statement is an assignment that assigns the value of the match expression to database. The match expression depends on the return value of sqlite::open, which probably is of type Result<T, E> (an enum with variants Ok(T) and Err(E)). In case it's Ok, the enum variant has a parameter which the match expression destructures into db and passes back this value (therefore it gets assigned to the variable database). In case it's Err, the enum variant has a parameter with an error object which is printed and the function returns.
Without using a match statement, this could be written like the following (just because you explicitly asked for not using match - most people will considered this bad coding style):
let res = sqlite::open("test.db");
if res.is_err() {
println!("Error opening test.db: {:?}", res.unwrap_err());
return;
}
let database = res.unwrap();
I'm just learning Rust myself, but this is another way of dealing with this.
if let Ok(database) = sqlite::open("test.db") {
// Handle success case
} else {
// Handle error case
}
See the documentation about if let.
This function open returns SqliteResult<Database>; given the definition pub type SqliteResult<T> = Result<T, ResultCode>, that is std::result::Result<Database, ResultCode>.
Result is an enum, and you fundamentally cannot access the variants of an enum without matching: that is, quite literally, the only way. Sure, you may have methods for it abstracting away the matching, but they are necessarily implemented with match.
You can see from the Result documentation that it does have convenience methods like is_err, which is approximately this (it's not precisely this but close enough):
fn is_err(&self) -> bool {
match *self {
Ok(_) => false,
Err(_) => true,
}
}
and unwrap (again only approximate):
fn unwrap(self) -> T {
match self {
Ok(t) => t,
Err(e) => fail!(),
}
}
As you see, these are implemented with matching. In this case of yours, using the matching is the best way to write this code.
sqlite::open() is returning an Enum. Enums are a little different in rust, each value of an enum can have fields attached to it.
See http://static.rust-lang.org/doc/0.8/tutorial.html#enums
So in this case the SqliteResult enum can either be Ok or Err if it is Ok then it has the reference to the db attached to it, if it is Err then it has the error details.
With a C# or Java background you could consider the SqliteResult as a base class that Ok and Err inherit from, each with their own relevant information. In this scenario the match clause is simply checking the type to see which subtype was returned. I wouldn't get too fixated on this parallel though it is a bad idea to try this hard to match concepts between languages.

Resources