I currently have an agent that does heavy data processing by constantly posting "work" messages to itself.
Sometimes clients to this agent wants to interrupt this processing to access the data in a safe manner.
For this I thought that posting an async to the agent that the agent can run whenever it's in a safe state would be nice. This works fine and the message looks like this:
type Message = |Sync of Async<unit>*AsyncReplyChannel<unit>
And the agent processing simply becomes:
match mailbox.Receive () with
| Sync (async, reply) -> async |> Async.RunSynchronously |> reply.Reply
This works great as long as clients don't need to return some value from the async as I've constrained the async/reply to be of type unit and I cannot use a generic type in the discriminated union.
My best attempts to solve this has involved wrapper asyncs and waithandles, but this seems messy and not as elegant as I've come to expect from F#. I'm also new to async workflows in F# so it's very possible that I've missed/misunderstood some concepts here.
So the question is; how can I return generic types in a agent response?
The thing that makes this difficult is that, in your current version, the agent would somehow have to calculate the value and then pass it to the channel, without knowing what is the type of the value. Doing that in a statically typed way in F# is tricky.
If you make the message generic, then it will work, but the agent will only be able to handle messages of one type (the type T in Message<T>).
An alternative is to simply pass Async<unit> to the agent and let the caller do the value passing for each specific type. So, you can write message & agent just like this:
type Message = | Sync of Async<unit>
let agent = MailboxProcessor.Start(fun inbox -> async {
while true do
let! msg = inbox.Receive ()
match msg with
| Sync (work) -> do! work })
When you use PostAndReply, you get access to the reply channel - rather than passing the channel to the agent, you can just use it in the local async block:
let num = agent.PostAndReply(fun chan -> Sync(async {
let ret = 42
chan.Reply(ret) }))
let str = agent.PostAndReply(fun chan -> Sync(async {
let ret = "hi"
chan.Reply(ret) }))
Related
I'm making an API where the user can submit items to be processed, and they might want to check whether their item was processed successfully. I thought that this would be a good place to use tokio::sync::oneshot channels, where I'd return the receiver to the caller, and they can later await on it to get the result they're looking for.
let processable_item = ...;
let where_to_submit: impl Submittable = get_submit_target();
let status_handle: oneshot::Receiver<SubmissionResult> = where_to_submit.submit(processable_item).await;
// ... do something that does not depend on the SubmissionResult ...
// Now we want to get the status of our submission
let status = status_handle.await;
Submitting the item involves creating a oneshot channel, and putting the Sender half into a queue while the Receiver goes back to the calling code:
#[async_trait]
impl Submittable for Something {
async fn submit(item: ProcessableItem) -> oneshot::Receiver<SubmissionResult> {
let (sender, receiver) = oneshot::channel();
// Put the item, with the associated sender, into a queue
let queue: mpsc::Receiver<(ProcessableItem, oneshot::Sender<SubmissionResult>)> = get_processing_queue();
queue.send( (item, sender) ).await.expect("Processing task closed!");
return receiver;
}
}
When I do this, cargo clippy complains (via the [clippy::async_yields_async] lint) that I'm returning oneshot::Receiver, which can be awaited, from an async function, and suggests that I await it then.
This is not what I wanted, which is to allow a degree of background processing while the user doesn't need the SubmissionResult yet, as opposed to making them wait until it's available.
Is this API even a good idea? Does there exist a common approach to doing this?
Looks fine to me. This is a false positive of Clippy, so you can just silence it: #[allow(clippy::async_yields_async)].
I have some code that looks like this:
async move {
let res = do_sth(&state).await;
(state, res)
}.boxed()
(Full example: https://gitlab.com/msrd0/async-issue)
I'd say that the async move block takes ownership of state and passes a reference of state along to the do_sth method, which is an async fn. However, the compiler also keeps &state across the await bound, and I have no idea why it would do that:
error: future cannot be sent between threads safely
--> src/main.rs:30:5
|
30 | }.boxed()
| ^^^^^ future returned by `read_all` is not `Send`
|
= help: the trait `std::marker::Sync` is not implemented for `(dyn std::any::Any + std::marker::Send + 'static)`
note: future is not `Send` as this value is used across an await
--> src/main.rs:28:14
|
28 | let res = do_sth(&state).await;
| ^^^^^^^------^^^^^^^- `&state` is later dropped here
| | |
| | has type `&gotham::state::State`
| await occurs here, with `&state` maybe used later
I tried placing the do_sth call without the await into its own block but that didn't fix the error etiher.
Is there any way to avoid this error?
The error is pretty clearly not related to ownership or lifetimes:
error: future cannot be sent between threads safely
gotham_restful::State does not implement the Sync trait, which means that its reference &state is not thread-safe. However, you are passing that reference to asynchronous function, which is then awaited, and the Rust compiler automatically infers that that function is not thread safe, so the entire block become "not thread safe". The return value of the read_all method has the + Send constraint, however, requiring the returned future to be thread safe, so this causes an error.
One possible solution is to rewrite do_sth to be a regular function that returns a future. This way you can ensure that the returned future from that function implements Send and is thread-safe, instead of relying on the compiler to infer where it is thread safe or not:
fn do_sth(_state: &State) -> impl Future<Output = NoContent> + Send {
// require that the future of this function is thread-safe ---^
async move {
Default::default()
}
}
Note that this will not actually allow you to do anything that is not thread safe, however it will instruct the compiler that the do_sth function should be thread safe, instead of attempting to manually infer whether it should be or not.
In my RSS reader project, I want to read my RSS feeds asynchronously. Currently, they're read synchronously thanks to this code block
self.feeds = self
.feeds
.iter()
.map(|f| f.read(&self.settings))
.collect::<Vec<Feed>>();
I want to make that code asynchronous, because it will allow me to better handle poor web server responses.
I understand I can use a Stream that I can create from my Vec using stream::from_iter(...) which transforms the code into something like
self.feeds = stream::from_iter(self.feeds.iter())
.map(|f| f.read(&self.settings))
// ???
.collect::<Vec<Feed>>()
}
But then, I have two questions
How to have results joined into a Vec (which is a synchronous struct)?
How to execute that stream? I was thinking about using task::spawn but it doesn't seems to work ...
How to execute that stream? I was thinking about using task::spawn but it doesn't seems to work
In the async/await world, asynchronous code is meant to be executed by an executor, which is not part of the standard library but provided by third-party crates such as tokio. task::spawn only schedules one instance of async fn to run, not actually running it.
How to have results joined into a vec (which is a sync struct)
The bread and butter of your rss reader seems to be f.read. It should be turned into an asynchronous function. Then the vector of feeds will be mapped into a vector of futures, which need to be polled to completion.
The futures crate has futures::stream::futures_unordered::FuturesUnordered to help you do that. FuturesUnordered itself implements Stream trait. This stream is then collected into the result vector and awaited to completion like so:
//# tokio = { version = "0.2.4", features = ["full"] }
//# futures = "0.3.1"
use tokio::time::delay_for;
use futures::stream::StreamExt;
use futures::stream::futures_unordered::FuturesUnordered;
use std::error::Error;
use std::time::{Duration, Instant};
#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
let start = Instant::now();
let feeds = (0..10).collect::<Vec<_>>();
let res = read_feeds(feeds).await;
dbg!(res);
dbg!(start.elapsed());
Ok(())
}
async fn read_feeds(feeds: Vec<u32>) -> Vec<u32> {
feeds.iter()
.map(read_feed)
.collect::<FuturesUnordered<_>>()
.collect::<Vec<_>>()
.await
}
async fn read_feed(feed: &u32) -> u32 {
delay_for(Duration::from_millis(500)).await;
feed * 2
}
delay_for is to simulate the potentially expensive operation. It also helps to demonstrate that these readings indeed happen concurrently without any explicit thread related logic.
One nuance here. Unlike its synchronous counterpart, the results of reading rss feeds aren't in the same order of feeds themselves any more, whichever returns the first will be at the front. You need to deal with that somehow.
I understand how to make a message based non-blocking application in akka, and can easily mock up examples that perform
concurrent operations and pass back the aggregated results in a message. Where I have difficulty is understanding what my
non-blocking options are when my application has to respond to an HTTP request. The goal is to receive a request and
immediately hand it over to a local or remote actor to do the work, which in turn will hand it off to get a result that
could take some time. Unfortunatly under this model, I don't understand how I could express this with a non-blocking
series of "tells" rather than blocking "asks". If at any point in the chain I use a tell, I no longer have a future to
use as the eventual response content (required by the http framework interface which in this case is finagle - but that is not
important). I understand the request is on its own thread, and my example is quite contrived, but just trying to
understand my design options.
In summary, If my contrived example below can be reworked to block less I very much love to understand how. This is my
first use of akka since some light exploration a year+ ago, and in every article, document, and talk I have viewed says
not to block for services.
Conceptual answers may be helpful but may also be the same as what I have already read. Working/Editing my example
would likely be key to my understanding of the exact problem I am attempting to solve. If the current example is generally
what needs to be done that confirmation is helpful too, so I don't search for magic that does not exist.
Note The following aliases: import com.twitter.util.{Future => TwitterFuture, Await => TwitterAwait}
object Server {
val system = ActorSystem("Example-System")
implicit val timeout = Timeout(1 seconds)
implicit def scalaFuture2twitterFuture[T](scFuture: Future[T]): TwitterFuture[T] = {
val promise = TwitterPromise[T]
scFuture onComplete {
case Success(result) ⇒ promise.setValue(result)
case Failure(failure) ⇒ promise.setException(failure)
}
promise
}
val service = new Service[HttpRequest, HttpResponse] {
def apply(req: HttpRequest): TwitterFuture[HttpResponse] = req.getUri match {
case "/a/b/c" =>
val w1 = system.actorOf(Props(new Worker1))
val r = w1 ? "take work"
val response: Future[HttpResponse] = r.mapTo[String].map { c =>
val resp = new DefaultHttpResponse(HttpVersion.HTTP_1_1, HttpResponseStatus.OK)
resp.setContent(ChannelBuffers.copiedBuffer(c, CharsetUtil.UTF_8))
resp
}
response
}
}
//val server = Http.serve(":8080", service); TwitterAwait.ready(server)
class Worker1 extends Actor with ActorLogging {
def receive = {
case "take work" =>
val w2 = context.actorOf(Props(new Worker2))
pipe (w2 ? "do work") to sender
}
}
class Worker2 extends Actor with ActorLogging {
def receive = {
case "do work" =>
//Long operation...
sender ! "The Work"
}
}
def main(args: Array[String]) {
val r = service.apply(
com.twitter.finagle.http.Request("/a/b/c")
)
println(TwitterAwait.result(r).getContent.toString(CharsetUtil.UTF_8)) // prints The Work
}
}
Thanks in advance for any guidance offered!
You can avoid sending a future as a message by using the pipe pattern—i.e., in Worker1 you'd write:
pipe(w2 ? "do work") to sender
Instead of:
sender ! (w2 ? "do work")
Now r will be a Future[String] instead of a Future[Future[String]].
Update: the pipe solution above is a general way to avoid having your actor respond with a future. As Viktor points out in a comment below, in this case you can take your Worker1 out of the loop entirely by telling Worker2 to respond directly to the actor that it (Worker1) got the message from:
w2.tell("do work", sender)
This won't be an option if Worker1 is responsible for operating on the response from Worker2 in some way (by using map on w2 ? "do work", combining multiple futures with flatMap or a for-comprehension, etc.), but if that's not necessary, this version is cleaner and more efficient.
That kills one Await.result. You can get rid of the other by writing something like the following:
val response: Future[HttpResponse] = r.mapTo[String].map { c =>
val resp = new DefaultHttpResponse(HttpVersion.HTTP_1_1, HttpResponseStatus.OK)
resp.setContent(ChannelBuffers.copiedBuffer(c, CharsetUtil.UTF_8))
resp
}
Now you just need to turn this Future into a TwitterFuture. I can't tell you off the top of my head exactly how to do this, but it should be fairly trivial, and definitely doesn't require blocking.
You definitely don't have to block at all here. First, update your import for the twitter stuff to:
import com.twitter.util.{Future => TwitterFuture, Await => TwitterAwait, Promise => TwitterPromise}
You will need the twitter Promise as that's the impl of Future you will return from the apply method. Then, follow what Travis Brown said in his answer so your actor is responding in such a way that you do not have nested futures. Once you do that, you should be able to change your apply method to something like this:
def apply(req: HttpRequest): TwitterFuture[HttpResponse] = req.getUri match {
case "/a/b/c" =>
val w1 = system.actorOf(Props(new Worker1))
val r = (w1 ? "take work").mapTo[String]
val prom = new TwitterPromise[HttpResponse]
r.map(toResponse) onComplete{
case Success(resp) => prom.setValue(resp)
case Failure(ex) => prom.setException(ex)
}
prom
}
def toResponse(c:String):HttpResponse = {
val resp = new DefaultHttpResponse(HttpVersion.HTTP_1_1, HttpResponseStatus.OK)
resp.setContent(ChannelBuffers.copiedBuffer(c, CharsetUtil.UTF_8))
resp
}
This probably needs a little more work. I didn't set it up in my IDE, so I can't guarantee you it compiles, but I believe the idea to be sound. What you return from the apply method is a TwitterFuture that is not yet completed. It will be completed when the future from the actor ask (?) is done and that's happing via a non-blocking onComplete callback.
For a broader context, here is my code, which downloads a list of URLs.
It seems to me that there is no good way to handle timeouts in F# when using use! response = request.AsyncGetResponse() style URL fetching. I have pretty much everything working as I'd like it too (error handling and asynchronous request and response downloading) save the problem that occurs when a website takes a long time to response. My current code just hangs indefinitely. I've tried it on a PHP script I wrote that waits 300 seconds. It waited the whole time.
I have found "solutions" of two sorts, both of which are undesirable.
AwaitIAsyncResult + BeginGetResponse
Like the answer by ildjarn on this other Stack Overflow question. The problem with this is that if you have queued many asynchronous requests, some are artificially blocked on AwaitIAsyncResult. In other words, the call to make the request has been made, but something behind the scenes is blocking the call. This causes the time-out on AwaitIAsyncResult to be triggered prematurely when many concurrent requests are made. My guess is a limit on the number of requests to a single domain or just a limit on total requests.
To support my suspicion I wrote little WPF application to draw a timeline of when the requests seem to be starting and ending. In my code linked above, notice the timer start and stops on lines 49 and 54 (calling line 10). Here is the resulting timeline image.
When I move the timer start to after the initial response (so I am only timing the downloading of the contents), the timeline looks a lot more realistic. Note, these are two separate runs, but no code change aside from where the timer is started. Instead of having the startTime measured directly before use! response = request.AsyncGetResponse(), I have it directly afterwards.
To further support my claim, I made a timeline with Fiddler2. Here is the resulting timeline. Clearly the requests aren't starting exactly when I tell them to.
GetResponseStream in a new thread
In other words, synchronous requests and download calls are made in a secondary thread. This does work, since GetResponseStream respects the Timeout property on the WebRequest object. But in the process, we lose all of the waiting time as the request is on the wire and the response hasn't come back yet. We might as well write it in C#... ;)
Questions
Is this a known problem?
Is there any good solution that takes advantage of F# asynchronous workflows and still allows timeouts and error handling?
If the problem is really that I am making too many requests at once, then would the best way to limit the number of request be to use a Semaphore(5, 5) or something like that?
Side Question: if you've looked at my code, can you see any stupid things I've done and could fix?
If there is anything you are confused about, please let me know.
AsyncGetResponse simply ignoring any timeout value posted... here's a solution we just cooked:
open System
open System.IO
open System.Net
type Request = Request of WebRequest * AsyncReplyChannel<WebResponse>
let requestAgent =
MailboxProcessor.Start <| fun inbox -> async {
while true do
let! (Request (req, port)) = inbox.Receive ()
async {
try
let! resp = req.AsyncGetResponse ()
port.Reply resp
with
| ex -> sprintf "Exception in child %s\n%s" (ex.GetType().Name) ex.Message |> Console.WriteLine
} |> Async.Start
}
let getHTML url =
async {
try
let req = "http://" + url |> WebRequest.Create
try
use! resp = requestAgent.PostAndAsyncReply ((fun chan -> Request (req, chan)), 1000)
use str = resp.GetResponseStream ()
use rdr = new StreamReader (str)
return Some <| rdr.ReadToEnd ()
with
| :? System.TimeoutException ->
req.Abort()
Console.WriteLine "RequestAgent call timed out"
return None
with
| ex ->
sprintf "Exception in request %s\n\n%s" (ex.GetType().Name) ex.Message |> Console.WriteLine
return None
} |> Async.RunSynchronously;;
getHTML "www.grogogle.com"
i.e. We're delegating to another agent and calling it providing an async timeout... if we do not get a reply from the agent in the specified amount of time we abort the request and move on.
I see my other answer may fail to answer your particular question... here's another implementation for a task limiter that doesn't require the use of semaphore.
open System
type IParallelLimiter =
abstract GetToken : unit -> Async<IDisposable>
type Message=
| GetToken of AsyncReplyChannel<IDisposable>
| Release
let start count =
let agent =
MailboxProcessor.Start(fun inbox ->
let newToken () =
{ new IDisposable with
member x.Dispose () = inbox.Post Release }
let rec loop n = async {
let! msg = inbox.Scan <| function
| GetToken _ when n = 0 -> None
| msg -> async.Return msg |> Some
return!
match msg with
| Release ->
loop (n + 1)
| GetToken port ->
port.Reply <| newToken ()
loop (n - 1)
}
loop count)
{ new IParallelLimiter with
member x.GetToken () =
agent.PostAndAsyncReply GetToken}
let limiter = start 100;;
for _ in 0..1000 do
async {
use! token = limiter.GetToken ()
Console.WriteLine "Sleeping..."
do! Async.Sleep 3000
Console.WriteLine "Releasing..."
} |> Async.Start