Reference is held beyond being used eventhough the scope has ownership of the variable - asynchronous

I have some code that looks like this:
async move {
let res = do_sth(&state).await;
(state, res)
}.boxed()
(Full example: https://gitlab.com/msrd0/async-issue)
I'd say that the async move block takes ownership of state and passes a reference of state along to the do_sth method, which is an async fn. However, the compiler also keeps &state across the await bound, and I have no idea why it would do that:
error: future cannot be sent between threads safely
--> src/main.rs:30:5
|
30 | }.boxed()
| ^^^^^ future returned by `read_all` is not `Send`
|
= help: the trait `std::marker::Sync` is not implemented for `(dyn std::any::Any + std::marker::Send + 'static)`
note: future is not `Send` as this value is used across an await
--> src/main.rs:28:14
|
28 | let res = do_sth(&state).await;
| ^^^^^^^------^^^^^^^- `&state` is later dropped here
| | |
| | has type `&gotham::state::State`
| await occurs here, with `&state` maybe used later
I tried placing the do_sth call without the await into its own block but that didn't fix the error etiher.
Is there any way to avoid this error?

The error is pretty clearly not related to ownership or lifetimes:
error: future cannot be sent between threads safely
gotham_restful::State does not implement the Sync trait, which means that its reference &state is not thread-safe. However, you are passing that reference to asynchronous function, which is then awaited, and the Rust compiler automatically infers that that function is not thread safe, so the entire block become "not thread safe". The return value of the read_all method has the + Send constraint, however, requiring the returned future to be thread safe, so this causes an error.
One possible solution is to rewrite do_sth to be a regular function that returns a future. This way you can ensure that the returned future from that function implements Send and is thread-safe, instead of relying on the compiler to infer where it is thread safe or not:
fn do_sth(_state: &State) -> impl Future<Output = NoContent> + Send {
// require that the future of this function is thread-safe ---^
async move {
Default::default()
}
}
Note that this will not actually allow you to do anything that is not thread safe, however it will instruct the compiler that the do_sth function should be thread safe, instead of attempting to manually infer whether it should be or not.

Related

Allow a future to store a pointer to a pinned value in its container

Prelude
I have been working on this segment of code that attempts to provide a recyclable API for implementing an asynchronous stream for a REST paginator.
I have gone through many iterations and settled on storing state in an enumerable that describes at what point the process is at, both because I feel that it is the best fit for this purpose and also because it is something to learn from, being especially explicit about the whole process. I do not want to use stream! or try_stream! from the async-stream crate.
The state begins at Begin, and moves a PaginationDelegate into the next state after using it to make a request. This state is Pending and owns the delegate and a future that is returned from PaginationDelegate::next_page.
The issue appears when the next_page method needs a reference, &self, but the self is not stored on the stack frame of the future that is stored within the Pending state.
I wanted to keep this "flat" because I find the algorithm easier to follow, but I also wanted to learn how to create this self-referential structure the most correct way. I am aware that I can wrap the future and have it own the PaginationDelegate, and indeed this may be the method I end up using. Nevertheless, I want to know how I could move the two values into the same holding structure and keep the pointer alive for my own education.
Delegate Trait
Here a PaginationDelegate is defined. This trait is intended to be implemented and used by any method for function that intends to return a PaginatedStream or dyn Stream. Its purpose is to define how the requests will be made, as well as store a limited subset of the state (the offset for the next page from the REST API, and the total number of items that are expected from the API).
#[async_trait]
pub trait PaginationDelegate {
type Item;
type Error;
/// Performs an asynchronous request for the next page and returns either
/// a vector of the result items or an error.
async fn next_page(&self) -> Result<Vec<Self::Item>, Self::Error>;
/// Gets the current offset, which will be the index at the end of the
/// current/previous page. The value returned from this will be changed by
/// [`PaginatedStream`] immediately following a successful call to
/// [`next_page()`], increasing by the number of items returned.
fn offset(&self) -> usize;
/// Sets the offset for the next page. The offset is required to be the
/// index of the last item from the previous page.
fn set_offset(&mut self, value: usize);
/// Gets the total count of items that are currently expected from the API.
/// This may change if the API returns a different number of results on
/// subsequent pages, and may be less than what the API claims in its
/// response data if the API has a maximum limit.
fn total_items(&self) -> Option<usize>;
}
Stream State
The next segment is the enum itself, which serves as the implimentor for Stream and the holder for the current state of the iterator.
Note that currently the Pending variant has the delegate and the future separate. I could have used future: Pin<Box<dyn Future<Output = Result<(D, Vec<D::Item>), D::Error>>>> to keep the delegate inside of the Future but prefer not to because I want to solve the underlying problem and not gloss over it. Also, the delegate field is a Pin<Box<D>> because I was experimenting and I feel that this is the closest I have gotten to a correct solution.
pub enum PaginatedStream<D: PaginationDelegate> {
Begin {
delegate: D,
},
Pending {
delegate: Pin<Box<D>>,
#[allow(clippy::type_complexity)]
future: Pin<Box<dyn Future<Output = Result<Vec<D::Item>, D::Error>>>>,
},
Ready {
delegate: D,
items: VecDeque<D::Item>,
},
Closed,
Indeterminate,
}
Stream Implementation
Last part is the implementation of Stream. This is incomplete for two reasons; I have not finished it, and it would be best to keep the example short and minimal.
impl<D: 'static> Stream for PaginatedStream<D>
where
D: PaginationDelegate + Unpin,
D::Item: Unpin,
{
// If the state is `Pending` and the future resolves to an `Err`, that error is
// forwarded only once and the state set to `Closed`. If there is at least one
// result to return, the `Ok` variant is, of course, used instead.
type Item = Result<D::Item, D::Error>;
fn poll_next(mut self: Pin<&mut Self>, ctx: &mut Context<'_>) -> Poll<Option<Self::Item>> {
// Avoid using the full namespace to match all variants.
use PaginatedStream::*;
// Take ownership of the current state (`self`) and replace it with the
// `Indeterminate` state until the new state is in fact determined.
let this = std::mem::replace(&mut *self, Indeterminate);
match this {
// This state only occurs at the entry of the state machine. It only holds the
// `PaginationDelegate` that will be used to update the offset and make new requests.
Begin { delegate } => {
// Pin the delegate to the heap to ensure that it doesn't move and that pointers
// remain valid even after moving the value into the new state.
let delegate = Box::pin(delegate);
// Set the current state to `Pending`, after making the next request using the
// pinned delegate.
self.set(Pending {
delegate,
future: PaginationDelegate::next_page(delegate.as_ref()),
});
// Return the distilled verson of the new state to the callee, indicating that a
// new request has been made and we are waiting or new data.
Poll::Pending
}
// At some point in the past this stream was polled and made a new request. Now it is
// time to poll the future returned from that request that was made, and if results are
// available, unpack them to the `Ready` state and move the delegate. If the future
// still doesn't have results, set the state back to `Pending` and move the fields back
// into position.
Pending { delegate, future } => todo!(),
// The request has resolved with data in the past, and there are items ready for us to
// provide the callee. In the event that there are no more items in the `VecDeque`, we
// will make the next request and construct the state for `Pending` again.
Ready { delegate, items } => todo!(),
// Either an error has occurred, or the last item has been yielded already. Nobody
// should be polling anymore, but to be nice, just tell them that there are no more
// results with `Poll::Ready(None)`.
Closed => Poll::Ready(None),
// The `Indeterminate` state should have only been used internally and reset back to a
// valid state before yielding the `Poll` to the callee. This branch should never be
// reached, if it is, that is a panic.
Indeterminate => unreachable!(),
}
}
}
Compiler Messages
At the moment, in the Begin branch, there are two compiler messages where the borrow to the delegate (delegate.as_ref()) is taken and passed to the PaginationDelegate::next_page method.
The first is that the delegate does not live long enough, because the pinned value is moved into the new state variant Pending, and no longer resides at the position it was assigned. I do not understand why the compiler wants this to exist for 'static though, and would appreciate if this could be explained.
error[E0597]: `delegate` does not live long enough
--> src/lib.rs:90:59
|
90 | future: PaginationDelegate::next_page(delegate.as_ref()),
| ------------------------------^^^^^^^^^^^^^^^^^-
| | |
| | borrowed value does not live long enough
| cast requires that `delegate` is borrowed for `'static`
...
96 | }
| - `delegate` dropped here while still borrowed
I would also like to hear any methods you have for creating the values for fields of a struct that rely on data that should be moved into the struct (self-referential, the main issue of this entire post). I know it is wrong (and impossible) to use MaybeUninit here because any placeholder value that would later be dropped will cause undefined behavior. Possibly show me a method for allocating a structure of uninitialized memory and then overwriting those fields with values after they have been constructed, without letting the compiler attempt to free the uninitialized memory.
The second compiler message is as follows, which is similar to the first except that the temporary value for delegate is moved into the struct. I am to understand that this is fundamentally the same issue described above, but just explained differently by two separate heuristics. Is my understanding wrong?
error[E0382]: borrow of moved value: `delegate`
--> src/lib.rs:90:59
|
84 | let delegate = Box::pin(delegate);
| -------- move occurs because `delegate` has type `Pin<Box<D>>`, which does not implement the `Copy` trait
...
89 | delegate,
| -------- value moved here
90 | future: PaginationDelegate::next_page(delegate.as_ref()),
| ^^^^^^^^^^^^^^^^^ value borrowed here after move
Environment
This is real code but is already a MCVE I believe.
To set up the environment for this, the crate dependencies are as follows.
[dependencies]
futures-core = "0.3"
async-trait = "0.1"
And the imports that are used in the code,
use std::collections::VecDeque;
use std::pin::Pin;
use std::task::{Context, Poll};
use async_trait::async_trait;
use futures_core::{Future, Stream};
The potential solution that I did not want to use, because it hides the underlying issue (or rather avoids the intent of this question entirely) follows.
Where the PaginatedStream enumerable is defined, change the Pending to the following.
Pending {
#[allow(clippy::type_complexity)]
future: Pin<Box<dyn Future<Output = Result<(D, Vec<D::Item>), D::Error>>>>,
},
Now, inside the implementation of Stream change the matching arm for Begin to the following.
// This state only occurs at the entry of the state machine. It only holds the
// `PaginationDelegate` that will be used to update the offset and make new requests.
Begin { delegate } => {
self.set(Pending {
// Construct a new future that awaits the result and has a new type for `Output`
// that contains both the result and the moved delegate.
// Here the delegate is moved into the future via the `async` block.
future: Box::pin(async {
let result = delegate.next_page().await;
result.map(|items| (delegate, items))
}),
});
// Return the distilled verson of the new state to the callee, indicating that a
// new request has been made and we are waiting or new data.
Poll::Pending
}
The compiler knows that that async block is really async move, you could be more explicit if you wanted. This effectively moves the delegate into the stack frame of the future that is boxed and pinned, ensuring that whenever the value is moved in memory the two values move together and the pointer cannot be invalidated.
The other matching arm for Pending needs to be updated to reflect the change in signature. Here is a complete implementation of the logic.
// At some point in the past this stream was polled and asked the delegate to make a new
// request. Now it is time to poll the future returned from that request that was made,
// and if results are available, unpack them to the `Ready` state and move
// the delegate. If the future still doesn't have results, set the state
// back to `Pending` and move the fields back into position.
Pending { mut future } => match future.as_mut().poll(ctx) {
// The future from the last request returned successfully with new items,
// and gave the delegate back.
Poll::Ready(Ok((mut delegate, items))) => {
// Tell the delegate the offset for the next page, which is the sum of the old
// old offset and the number of items that the API sent back.
delegate.set_offset(delegate.offset() + items.len());
// Construct a new `VecDeque` so that the items can be popped from the front.
// This should be more efficient than reversing the `Vec`, and less confusing.
let mut items = VecDeque::from(items);
// Get the first item out so that it can be yielded. The event that there are no
// more items should have been handled by the `Ready` branch, so it should be
// safe to unwrap.
let popped = items.pop_front().unwrap();
// Set the new state to `Ready` with the delegate and the items.
self.set(Ready { delegate, items });
Poll::Ready(Some(Ok(popped)))
}
// The future from the last request returned with an error.
Poll::Ready(Err(error)) => {
// Set the state to `Closed` so that any future polls will return
// `Poll::Ready(None)`. The callee can even match against this if needed.
self.set(Closed);
// Forward the error to whoever polled. This will only happen once because the
// error is moved, and the state set to `Closed`.
Poll::Ready(Some(Err(error)))
}
// The future from the last request is still pending.
Poll::Pending => {
// Because the state is currently `Indeterminate` it must be set back to what it
// was. This will move the future back into the state.
self.set(Pending { future });
// Tell the callee that we are still waiting for a response.
Poll::Pending
}
},

Calling async function from closure

I want to await an async function inside a closure used in an iterator. The function requiring the closure is called inside a struct implementation. I can't figure out how to do this.
This code simulates what I'm trying to do:
struct MyType {}
impl MyType {
async fn foo(&self) {
println!("foo");
(0..2).for_each(|v| {
self.bar(v).await;
});
}
async fn bar(&self, v: usize) {
println!("bar: {}", v);
}
}
#[tokio::main]
async fn main() {
let mt = MyType {};
mt.foo().await;
}
Obviously, this will not work since the closure is not async, giving me:
error[E0728]: `await` is only allowed inside `async` functions and blocks
--> src/main.rs:8:13
|
7 | (0..2).for_each(|v| {
| --- this is not `async`
8 | self.bar(v).await;
| ^^^^^^^^^^^^^^^^^ only allowed inside `async` functions and blocks
After looking for an answer on how to call an async function from a non-async one, I eded up with this:
tokio::spawn(async move {
self.bar(v).await;
});
But now I'm hitting lifetime issues instead:
error[E0759]: `self` has an anonymous lifetime `'_` but it needs to satisfy a `'static` lifetime requirement
--> src/main.rs:4:18
|
4 | async fn foo(&self) {
| ^^^^^
| |
| this data with an anonymous lifetime `'_`...
| ...is captured here...
...
8 | tokio::spawn(async move {
| ------------ ...and is required to live as long as `'static` here
This also doesn't surprise me since from what I understand the Rust compiler cannot know how long a thread will live. Given this, the thread spawned with tokio::spawn might outlive the type MyType.
The first fix I came up with was to make bar an associate function, copy everything I need in my closure and pass it as a value to bar and call it with MyType::bar(copies_from_self) but this is getting ugly since there's a lot of copying. It also feels like a workaround for not knowing how lifetimes work.
I was instead trying to use futures::executor::block_on which works for simple tasks like the one in this post:
(0..2).for_each(|v| {
futures::executor::block_on(self.bar(v));
});
But when putting this in my real life example where I use a third party library1 which also uses tokio, things no longer work. After reading the documentation, I realise that #[tokio::main] is a macro that eventually wraps everything in block_on so by doing this there will be nested block_on. This might be the reason why one of the async methods called in bar just stops working without any error or logging (works without block_on so shouldn't be anything with the code). I reached out to the authors who said that I could use for_each(|i| async move { ... }) which made me even more confused.
(0..2).for_each(|v| async move {
self.bar(v).await;
});
Will result in the compilation error
expected `()`, found opaque type`
which I think makes sense since I'm now returning a future and not (). My naive approach to this was to try and await the future with something like this:
(0..2).for_each(|v| {
async move {
self.bar(v).await;
}
.await
});
But that takes me back to square one, resulting in the following compilation error which I also think makes sense since I'm now back to using await in the closure which is sync.
only allowed inside `async` functions and blocks` since the
This discovery also makes it hard for me to make use of answers such as the ones found here and here.
The question after all this cargo cult programming is basically, is it possible, and if so how do I call my async function from the closure (and preferably without spawning a thread to avoid lifetime problems) in an iterator? If this is not possible, what would an idiomatic implementation for this look like?
1This is the library/method used
Iterator::for_each expects a synchronous closure, thus you can't use .await in it (not directly at least), nor can you return a future from it.
One solution is to just use a for loop instead of .for_each:
for v in 0..2 {
self.bar(v).await;
}
The more general approach is to use streams instead of iterators, since those are the asynchronous equivalent (and the equivalent methods on streams are typically asynchronous as well). This would work not only for for_each but for most other iterator methods:
use futures::prelude::*;
futures::stream::iter(0..2)
.for_each(|c| async move {
self.bar(v).await;
})
.await;

Async method that consumes self causes error "returns a value referencing data owned by the current function"

Is it possible to consume self in an async method?
Here is my method (please note the poller is unrelated to the poll method in rust futures).
pub async fn all(self) -> WebDriverResult<Vec<WebElement<'a>>> {
let elements = self.poller.run(&self.selectors).await?;
Ok(elements)
}
and for completeness, here is the struct it belongs to:
pub struct ElementQuery<'a> {
session: &'a WebDriverSession,
poller: ElementPoller,
selectors: Vec<ElementSelector<'a>>,
}
NOTE: session is passed in via the new() fn. It is owned by the caller that creates ElementQuery and runs it to completion. The above method should return Vec<WebElement>.
The error says:
error[E0515]: cannot return value referencing local data `self.poller`
--> src/query.rs:193:9
|
193 | self.poller.run(&self.selectors).await
| -----------^^^^^^^^^^^^^^^^^^^^^^^^^^^
| |
| returns a value referencing data owned by the current function
| `self.poller` is borrowed here
However, as far as I can tell, I am not returning any references owned by the current function. The only reference being returned (inside WebElement) is session, which I passed into the struct to begin with.
I can change the ? to unwrap() so that the only return path is Ok(elements) just to eliminate any other possibilities (same error).
Here is WebElement:
#[derive(Debug, Clone)]
pub struct WebElement<'a> {
pub element_id: ElementId,
pub session: &'a WebDriverSession,
}
ElementId is just a newtype wrapper around String - no references.
Is this something to do with the method returning a Future that may execute later? and perhaps something to do with something being held over an .await boundary?
If the ONLY reference being returned is &WebDriverSession, and that same reference was passed in via new(), then why is it telling me that I'm returning a reference owned by the function?
EDIT: Managed to get this to compile by effectively inlining some code rather than calling additional async functions. Still not sure why this fails, other than it seems to be related to holding a reference across .await boundaries, even if that reference is not owned by the function/struct on either side.
Here's a playground link with a simplified example:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=f38546480aaeb3594c8c17fec9b7d62c

When function should be effectful?

When I use FFI to wrap some API (for example DOM API) is there any rule of thumb that could help me to decide whether function should be effectful or not?
Here is an example:
foreign import querySelectorImpl """
function querySelectorImpl (Nothing) {
return function (Just) {
return function (selector) {
return function (src) {
return function () {
var result = src.querySelector(selector);
return result ? Just(result) : Nothing;
};
};
};
};
}
""" :: forall a e. Maybe a -> (a -> Maybe a) -> String -> Node -> Eff (dom :: DOM | e) (Maybe Node)
querySelector :: forall e. String -> Node -> Eff (dom :: DOM | e) (Maybe Node)
querySelector = querySelectorImpl Nothing Just
foreign import getTagName """
function getTagName (n) {
return function () {
return n.tagName;
};
}
""" :: forall e. Node -> Eff (dom :: DOM | e) String
It feels right for querySelector to be effectful, but I'm not quite sure about getTagName
Update:
I understand what a pure function is and that it should not change the state of the program and maybe DOM was a bad example.
I ask this question because in most libraries that wrap existing js libraries pretty much every function is effectful even if it doesn't feels right. So maybe my actual question is - does this effect represent the need in this wrapped js lib or is it there just in case it is stateful inside?
If a function does not change state, and it always (past, present, and future) returns the same value when given the same arguments, then it does not need to return Eff, otherwise it does.
n.tagName is read-only, and as far as I know, it never changes. Therefore, getTagName is pure, and it's okay to not return Eff.
On the other hand, a getTextContent function must return Eff. It does not change state, but it does return different values at different times.
The vast vast vast majority of JS APIs (including the DOM) are effectful. getTagName is one of the very few exceptions. So when writing an FFI, PureScript authors just assume that all JS functions return Eff, even in the rare situations where they don't need to.
Thankfully the most recent version of purescript-dom uses non-Eff functions for nodeName, tagName, localName, etc.
Effectful functions are functions that are not pure, from Wikipedia:
In computer programming, a function may be described as a pure function if both these statements about the function hold:
The function always evaluates the same result value given the same argument value(s). The function result value cannot depend on any hidden information or state that may change as program execution proceeds or between different executions of the program, nor can it depend on any external input from I/O devices [...].
Evaluation of the result does not cause any semantically observable side effect or output, such as mutation of mutable objects or output to I/O devices [...].
Since the DOM stores state, functions wrapping calls to the DOM are almost always effectful.
For more details regarding PureScript, see Handling Native Effects with the Eff Monad.

Generic reply from agent/mailboxprocessor?

I currently have an agent that does heavy data processing by constantly posting "work" messages to itself.
Sometimes clients to this agent wants to interrupt this processing to access the data in a safe manner.
For this I thought that posting an async to the agent that the agent can run whenever it's in a safe state would be nice. This works fine and the message looks like this:
type Message = |Sync of Async<unit>*AsyncReplyChannel<unit>
And the agent processing simply becomes:
match mailbox.Receive () with
| Sync (async, reply) -> async |> Async.RunSynchronously |> reply.Reply
This works great as long as clients don't need to return some value from the async as I've constrained the async/reply to be of type unit and I cannot use a generic type in the discriminated union.
My best attempts to solve this has involved wrapper asyncs and waithandles, but this seems messy and not as elegant as I've come to expect from F#. I'm also new to async workflows in F# so it's very possible that I've missed/misunderstood some concepts here.
So the question is; how can I return generic types in a agent response?
The thing that makes this difficult is that, in your current version, the agent would somehow have to calculate the value and then pass it to the channel, without knowing what is the type of the value. Doing that in a statically typed way in F# is tricky.
If you make the message generic, then it will work, but the agent will only be able to handle messages of one type (the type T in Message<T>).
An alternative is to simply pass Async<unit> to the agent and let the caller do the value passing for each specific type. So, you can write message & agent just like this:
type Message = | Sync of Async<unit>
let agent = MailboxProcessor.Start(fun inbox -> async {
while true do
let! msg = inbox.Receive ()
match msg with
| Sync (work) -> do! work })
When you use PostAndReply, you get access to the reply channel - rather than passing the channel to the agent, you can just use it in the local async block:
let num = agent.PostAndReply(fun chan -> Sync(async {
let ret = 42
chan.Reply(ret) }))
let str = agent.PostAndReply(fun chan -> Sync(async {
let ret = "hi"
chan.Reply(ret) }))

Resources