Elixir waiting until any of multiple concurrent tasks completes - asynchronous

I have a function that posts some data over http to a host and returns an :ok or :error tuple response. I want to create some code that calls the function multiple times concurrently, posting the same data to different hosts distributed around the world, but then with the following behaviour:
Waits until the first successful response is received at which point my code returns with the successful response. At this point I don't want to kill the outstanding tasks, but I don't care if they are successful or not.
OR waits for every single request to fail at which point my code should respond with an error.
For example I can send multiple requests concurrently using Task.async:
pids = [host1, host2, host3]
|> Enum.map(& Task.async(MyModule, :post_data, [&1, payload]))
I can wait for all tasks to finish using Task.yield_many, but what I can't wrap my head around is how to wait for just any of the tasks to be successful. I kind of need a Task.yield_any. Any ideas how to approach this?

The easiest way would be probably to start tasks linked and trap exists. Once any task finishes, your code would receive :DOWN message.
The other way round would be to make your tasks send a message back to the parent process right before exiting.
Or, as a last resort, you might run Task.yield_many/2 in a loop with a quite small timeout, although it’d be kinda counter-idiomatic.

Related

How does Dart know when a program has finished in relation to asynchronous code

I've been learning about the event loop and asynchronous code execution in Dart. Here is how I understand it works:
Dart runs the synchronous code in main first.
Any completed futures are put on the event queue and get run in FIFO order.
Any microtask events get handled before the next event in the event queue.
The part I'm missing is what about an uncompleted future? What if you have something like this:
final myFuture = Future<int>.delayed(
Duration(seconds: 1),
() => 42,
);
In that one second delay, say all the synchronous code, microtasks, and event queue tasks have completed. How does Dart know it shouldn't terminate the program? Is there a little holder somewhere for uncompleted futures?
I don't know if my title accurately reflects what I'm asking. Feel free it edit it.
The life-time of a Dart program depends on where it runs.
Code running in a browser doesn't stop unless the page goes away. Web pages won't end like a script or application does.
So, what you are likely really asking is when a program run on the Dart VM stops. The short answer is: When there is nothing more to do (in the main isolate).
The program ends when the main isolate is done. That's the first isolate to run. If you start new isolates, they won't keep the program alive by themselves, so make sure to keep the main isolate alive until your program is done.
An isolate is done when there is nothing more to do, and there is no good way to receive instructions to do something more.
That means:
There is no code running.
The event and microtask queues are empty.
There are no scheduled timers or outstanding I/O operations (you can think of these as being kept in separate internal work queues, and when they have a result ready, they trigger events in the main event queue. These queues also need to be empty for the program to be done).
There are no open receive ports (ReceivePort, RawReceivePort).
That means that there is nothing to do, and no open communication channels to receive messages on.
The traditional way to keep an isolate alive, perhaps because the real computation is being done in a different isolate, is to create a ReceivePort, and then close it when everything else is done (which you'd probably notify them about by sending an event to that receive-port).

HTTP Response sent before async call returns

I am yet to understand the behavior of web server thread, if I make an async call to say, a database, and immediately return response ( say OK ) to the client without even waiting for the async call to return back. First of all, is it a good approach ? What will happen to the thread which made the async call and if it is used again to serve another request and then the previous async call returns to this particular thread. Or does web server holds this thread waiting till the async call which it made, returns. Then the issue would be many hanging threads would be open as and web server would be available to take more requests. I am looking for an answer.
It depends on the way your HTTP servers works. But you should be very cautious.
Let's say you have a main event loop taking care of incoming HTTP connections, and workers threads which manage the HTTP communications.
A worker thread should be considered ready to accept a new HTTP request management only when it is effectively completly ready for that.
In terms of pure HTTP the more important thing is to avoid sending a response before having received the whole query. It seems simple, and it's usually the case. But if the query as a body, which may be a chunked body, it could take time to receive the whole message.
You should never send a response before, unless it's something like a 400 bad request response, followed by a real tcp/ip connection closing. If you fail to do so, and you have a message length parsing issue, the fact that you sent a response before the end of the query may lead to security problems. It could be used to exploit differences in the parsing of messages between your server and any other HTTP agent in front of your server (ssl terminator, reverse proxy, etc), in some sort of http smuggling issue. For this agent, if you made a response, it means you had the whole message, and it can send the next message, where you will in fact think this is just another part of the body.
Now if you have the whole message, you can decide to send an early response and detach an asynchronous task to really perform some sort of stuff. but this means:
you have to assume that no more output should be generated, you will not try to send any output to the request issuer, you should consider that the communication is now closed
the worker thread should not receive new requests to manage, and this is the hard part. If this thread is marked as available for a new request, it may also be killed by the thread manager (you have in Nginx or Apache request counters associated with workers, and they are killed after reaching a limit, to create fresh ones). it may also receive a gracefull reload command (usually it's a kill), etc.
So you start to enter a zone where you should know the internals of the HTTP server, which is maybe managed by you, or not, and where changes may appear sooner or later. And you start to make very strange things, which leads usually to strange issues, hard to reproduce.
Uausally the best way to handle asynchronous tasks, while still being able to understand what happen, is to use a messaging system. Put a list of tasks in queue, and get a parallel asynchronous worker process which does things with theses tasks. track status of theses tasks if you need it.
Same things may apply with the client, after receiving a very fast HTTP answer, it may need to perform some ajax status polling for the task status. And you will maybe only have to check the status of the task in the queue to send a response.
You will get more control on the whole thing.
For me I really dislike having detached threads, coming from strange code, performing heavy tasks without any way of outputing a status or reporting errors, and maybe preventing the nice application stop calls (still waiting for strange threads to join) which does not imply a killall.
It depends whether this asynchronous operation performs something which the client should be notified about.
If you return 200 OK (i.e. successfully completed) and later the asynchronous operation fails then the client will not know about the error.
You of course have some options like sending some kind of push notification over websocket or sending another request which would return the actual result and things like that. So basically depends on your needs...

Elixir: what's the point of Async HTTP?

I'm used to languages where the request handlers run on a thread, so all I/O functions have an async version to prevent blocking the thread.
In Elixir, each request is handled in a lightweight process (actor?), and the runtime can multiplex thousands of actors on a single OS thread. If an actor blocks, the runtime swaps another actor to use the cpu. Since, an actor can block without blocking the thread, I don't see the point of async functions in Elixir. Yet, I came across this in the HTTPotion documentation:
iex> HTTPotion.get "http://floatboth.com", [], [stream_to: self]
%HTTPotion.AsyncResponse{id: {1372,8757,656584}}
What's the point of an async function here?
Per the README for HTTPotion, using stream_to will cause messages to be sent to the provided Pid for each chunk of the http response. You could use a receive block to accept these and handle them accordingly.
In general, it doesn't make sense to say "In Elixir, each request is handled..." because request means a very specific thing. Assuming this is related to a webapp and inbound requests.
In Elixir, each process is a chunk of code executed in order. When that chunk of code is finished, the process dies. One use of async responses in HTTPotion could be selective receive, where you want to process stuff as fast as possible but messages matching a certain pattern might take precedence. Selective Receive is one of the benefits of how Erlang handles concurrency over how, for instance, Go handles it (CSP).
Hope this is helpful. The point is, an actor can block without blocking the OS-level thread, but sometimes you want a given message to take priority, and in that case selective-receive out of the mailbox has substantial value. For instance, imagine one of the possible messages would be equivalent to "shucks, because of this value I know I don't care about this http response at all." Then you could shut down the process and not waste CPU processing the (earlier-received, CPU-intensive) messages that had queued in the mailbox.

Continue execution asynchronously after return statement

Is there any way to achieve this?
My actual requirement is, I want to return success from my rest service as soon I get the data and perform a basic action on it and then I want to continue some more operations on the data.
I thought of two approaches
Threading - Currently I don't know how I will make it through threading.
Bulk update - I will schedule a task that will do all this processing after may be an hour or so.
But I am not very sure how should I start implementing this.
Any help?
In the context of HTTP we have to remember that once we finish the response, the conversation is over. There is really no way to keep the request alive after you have given the client a response. You would need to implement some worker process that runs outside of HTTP that processes things "queued up" or "triggered" by an HTTP request.

WF4 -Get exceptions when using multiple ReceiveAndSendReply activities paralleled

I have a scenario and want to use multiple ReceiveAndSendReply activities running in parallel situation, each of them will be put in an infinite while loop to make sure all activities are always running and listening. So I used a parallel activity to pack all those ReceiveAndSendReply, and each ReceiveAndSendReply was put in a While activity with condition set to true. And of cause, I put some activities with business logic between Receive activity and SendReplyToRecieve activity.
Now I have a problem if it takes a long time to process a request in one branch, then during that time all other branches will be blocked. Any request for other Receive activities will not be processed, and both client, which include the one called long time run service and the other one who called other service during server process long time run service process, will also get exceptions.
Did anybody have an idea to fix it? Sorry since I am new user, can put post image of my workflow.
The workflow runtime is single treaded in that a given workflow instance only executes on a single thread at any given moment. So while your workflow is busy doing work it can't react to other incoming messages. Normally this isn't a problem as workflow's normally aren't compute intensive and doing async IO is real easy. One thing that might help is adding Delay activities with a real short timeout. They cause the workflow to pause letting it start processing the next request. Also make sure you put as few activities as you can between the Receive and the SendReply and add a short delay right after the SendReply.

Resources